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We  consider  the  spherically  projected  multivariate  linear  (SPML)  model  for 
directional  data  in  the  general  d-dimensional  case.  This  model  treats  directional 
observations  as  projections  onto  the  unit  sphere  of  unobserved  responses  from  a 
multivariate  linear  model.  We  show  that  maximum  likelihood  estimates  for  the 
model  are  readily  computed  using  iterative  methods  such  as  the  Newton-Raphson 
and  EM  algorithms.  Under  the  model  we  develop  three  tests  of  hypotheses, 
namely,  a test  for  equal  mean  directions  of  several  populations  assuming  the 
same  unknown  concentration  of  the  populations,  a test  for  the  assumption  of  a 
common  concentration,  and  a test  for  equal  mean  vectors.  The  performance  of 
the  tests  in  terms  of  both  size  and  power  is  investigated  through  simulations.  An 
example  with  three-dimensional  anthropological  data  is  presented  to  demonstrate 
the  application  of  the  tests.  We  also  introduce  the  one-way  random  effects  model 
for  directional  data.  We  consider  four  different  methods  for  computing  the  MLEs  of 
the  parameters,  including  Markov  chain  Monte  Carlo  EM  (MCMCEM)  and  Gauss- 
Hermite  approximation  methods.  We  also  define  a general  mixed  effects  model  for 
directional  data,  including  the  special  case  of  a random  intercept  model,  and  show 


how  estimation  can  be  done  using  an  MCMCEM  algorithm.  A formula  for  the 
mean  resultant  length  of  the  projected  normal  distribution,  which  is  the  underlying 
distribution  for  the  SPML  model,  for  the  general  d-dimensional  case  is  presented. 
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CHAPTER  1 
INTRODUCTION 


Observations  representing  directions  occur  in  many  sciences.  For  example,  in 
geology,  the  orientation  of  crystals  is  described  as  directions  in  the  three-dimensional 
space  (see  Vistelius,  1966);  in  biology,  butterfly  migration  leads  to  directions  in 
two  dimensions  (see  Walker  and  Littell,  1994);  in  meteorology,  the  wind  direction 
is  a source  of  three-dimensional  directions  (see  Fisher  et  al.,  1987);  in  astronomy, 
the  normals  to  the  orbital  planes  of  comets  are  regarded  as  directions  in  three 
dimensions  (see  Fisher  et  ah,  1987);  in  image  analysis,  two-dimensional  directions 
occur  in  machine  vision  (see  Mardia  et  al.,  1996);  in  medicine,  three-dimensional 
directions  occur  in  vector  cardiograms  (see  Downs  and  Liebman,  1969).  Although 
most  applications  involve  directions  measured  in  two  and  three  dimensions,  in  this 
dissertation  we  consider  the  general  d-dimensional  case,  d > 1.  There  are  several 
different  ways  to  specify  directions  in  d dimensions.  For  instance,  directions  in  the 
plane  can  be  regarded  as  unit  vectors,  angles  or  unit  complex  numbers.  In  this  work 
we  represent  directions  by  unit  vectors  and  thus  the  sample  space  is  the  d-dimensional 
unit  sphere  5d_1  :=  {u  e M.d  : ||u||  = 1},  where  ||-||  is  the  usual  Euclidean  norm. 
Special  statistical  methods  are  required  for  the  analysis  of  such  data,  which  take  into 
account  the  special  structure  of  the  sample  space. 

Let  U be  a random  direction  in  Kd,  that  is  U G Sd~ 1 . There  are  a variety 
of  distributions  on  Sd~ 1 that  can  be  used  to  model  U . Some  of  the  most  popular 
are  the  von  Mises-Fisher,  Fisher-Bingham,  wrapped  normal  and  projected  normal 
distributions.  Such  distributions  are  usually  defined  by  their  density  functions  with 
respect  to  the  uniform  measure  (distribution)  on  Sd~ 1 . Important  characteristics 
of  the  distribution  of  U are  the  mean  direction,  77  = E(U)/\\E(U)  ||,  and  the 


1 
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mean  resultant  length  (MRL),  p = \\E  (C/)||,  where  E (U)  represents  the  usual 
componentwise  expectation  for  vectors.  Note  that  since  ||-||  is  a convex  function,  from 
Jensen’s  inequality  we  have  \\E  (U)  ||  < E||f7 |j  = 1.  Therefore,  unless  U is  a constant, 
E (U)  lies  in  the  interior  of  the  unit  sphere  and  thus  is  not  a direction  as  defined 
above.  It  is  natural  to  divide  E ( U ) by  its  length  p = \\E  (U)  ||  to  obtain  the  measure 
of  location,  77,  which  is  now  a unit  vector,  i.e.  a direction.  The  parameter  p serves  as 
a measure  of  concentration  for  directional  distributions.  Note  that  0 < p < 1,  with 
large  values  of  p indicating  high  concentration  and  hence  low  dispersion. 

Suppose  that  U 1, . • • , Un  is  a random  sample  of  unit  vectors.  Then  we  define 
the  sample  analogues  of  77  and  p as  Uq  = U/\\U\\  and  R = ||t7||  respectively,  where 
u = EILi  uti  n is  the  sample  mean  of  the  UiS. 

Now  we  give  a simple  example  of  two-dimensional  directions  taken  from  Mardia 
and  Jupp  (2000). 

Example  1:  A roulette  wheel  was  spun  nine  times  and  the  positions,  measured  in 
degrees,  at  which  it  stopped  were 

43, 45, 52,  61,  75, 88, 88, 279, 357. 

The  data  are  plotted  on  Figure  1.  In  Table  1-1  below  we  find  the  unit  vectors 
corresponding  to  the  observed  degrees. 

Then,  for  the  sample  mean,  mean  direction,  and  mean  resultant  length  we  find 
U = (0.447,0.553),  Uq  = (0.629,0.778),  and  R = 0.711,  which  we  have  also  plotted 
on  Figure  1.  The  MRL  indicates  a moderate  concentration  of  the  data. 

Notice  that  the  direction  represented  by  Uq  is  not  the  same  as  the  direction 
represented  by 

9 

e = 5^/9  = 121. 

In  fact,  the  average  of  the  angles  6 is  not  a good  measure  of  mean  direction  as 
it  strongly  depends  on  how  the  initial  direction  of  the  circle  is  selected.  On  the 
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Table  1-1:  The  roulette  data. 


i 

6i  u,  - (un,ui2) 

1 

43 

(0.731,  0.682) 

2 

45 

(0.707,  0.707) 

3 

52 

(0.616,  0.788) 

4 

61 

(0.485,  0.875) 

5 

75 

(0.259,  0.966) 

6 

88 

(0.035,  0.999) 

7 

88 

(0.035,  0.999) 

8 

279 

(0.156,  -0.988) 

9 

357 

(0.999,  -0.052) 

Figure  1-1:  The  sample  mean  U,  the  mean  direction  U0  and  the  mean  resultant 
length  R for  the  roulette  data. 

other  hand,  the  mean  direction  defined  by  Uo  is  not  affected  by  the  choice  of  initial 
direction. 

There  are  several  important  books  on  directional  statistics  that  give  an  excellent 
overview  of  the  main  developments  in  this  area:  Watson  (1983),  which  is  concerned 
with  the  theory  of  distributions  on  spheres  of  arbitrary  dimension;  Fisher  (1993)  and 
Fisher  et  al.  (1987),  which  focus  on  methods  for  analyzing  data  on  the  unit  circle 
and  the  unit  three-dimensional  sphere,  respectively;  and  Mardia  and  Jupp  (2000), 
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which  gives  a unified  and  comprehensive  account  of  both  the  underlying  theory  and 
practical  methodology  for  directional  data. 

Directional  data  often  appear  as  the  response  in  a regression  setting  with  scalar 
and  categorical  predictors.  A useful  model  for  describing  such  data  is  the  spherically 
projected  multivariate  linear  (SPML)  model  introduced  by  Presnell  et  al.  (1998). 
This  model  treats  the  directional  observations  as  projections  onto  the  unit  sphere  of 
unobserved  response  vectors  arising  in  a multivariate  linear  model.  Presnell  et  al. 
(1998)  focus  on  the  two-dimensional  case  of  circular  data,  for  which  they  present 
likelihood  calculations,  prove  that  the  log-likelihood  function  is  concave  and  give 
iterative  methods  for  finding  the  maximum  likelihood  estimates.  Presnell  et  al.  (1998) 
also  demonstrate  the  great  flexibility  of  the  SPML  model  and  contrast  this  with  the 
difficulty  of  fitting  other  previously  proposed  models.  The  SPML  model  is  the  main 
focus  of  this  dissertation. 

Chapter  2 defines  the  projected  normal  distribution  which  is  the  underlying 
distribution  of  the  SPML  model.  A formula  for  the  MRL  of  the  distribution  is  derived 
in  the  general  d-dimensional  case.  The  density  and  MRL  of  the  projected  normal 
distribution  in  three  dimensions  are  compared  with  those  of  the  most  extensively 
studied  distribution,  the  von  Mises-Fisher  distribution. 

Chapter  3 introduces  the  SPML  regression  model  in  the  general  d-dimensional 
case.  The  likelihood  calculations  and  methods  for  computing  MLEs  proposed  by 
Presnell  et  al.  (1998)  are  generalized  to  an  arbitrary  dimension. 

In  Chapter  4 a one-way  fixed  effects  model  for  directional  data  is  defined. 
Tests  for  equal  mean  directions  assuming  a common  concentration,  equal  mean 
vectors  and  equal  concentrations,  are  developed.  A new  approximation  to  the 
null  distribution  of  the  transformed  likelihood  ratio  statistic  for  testing  same  mean 
directions  with  a common  concentration  assuming  underlying  von  Mises-Fisher 
distribution  is  proposed.  An  example  involving  anthropological  data  demonstrates 
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the  applications  of  the  methods  in  three  dimensions.  Results  from  simulation  studies 
on  the  size  and  power  of  the  tests  are  presented  at  the  end  of  the  chapter. 

Chapter  5 defines  the  one-way  random  effects  model  for  directional  observations. 
Four  different  algorithms  for  estimating  the  parameters  of  the  model  are  considered: 
direct  likelihood  calculations  using  Gauss-Hermite  quadrature;  Markov  chain  Monte 
Carlo  EM  (MCMCEM)  algorithm;  Monte  Carlo  EM  algorithm;  and  finally,  a method 
based  on  the  results  from  Chapter  3.  The  implementation  of  the  first  two  methods 
is  illustrated  in  an  example  with  simulated  two-dimensional  data. 

Chapter  6 introduces  the  mixed  effects  model,  including  the  special  case  of  a 
random  intercept  model.  Estimation  is  performed  using  an  MCMCEM  algorithm. 

Finally,  Chapter  7 summarizes  the  results  and  discusses  future  research. 


CHAPTER  2 

PROJECTED  NORMAL  DISTRIBUTION 


2.1  Definition 


Suppose  that  the  random  vector  Y has  a d-variate  normal  distribution  with 
mean  /z  and  covariance  matrix  S,  which  we  indicate  with  the  standard  notation 
Y ~ Nd  (/x,  £).  Then  U = Y /||y||  is  a random  point  on  5d_1.  The  distribution  of 
U has  been  called  by  different  names:  an  offset  normal  Mardia  (1972),  a displaced 
normal  Kendall  (1974),  an  angular  Gaussian  Watson  (1983),  and  a projected  normal 
Small  (1996).  We  will  refer  to  it  as  the  projected  normal  distribution  and  denote  it  by 
PNd  (l-i.  X).  In  this  work  we  restrict  our  attention  to  projected  normal  distributions 
with  X = I,  the  identity  matrix.  The  density  function  of  U with  respect  to  the 
uniform  distribution  on  Sd~ 1 is  then  (see  Watson,  1983,  Appendix  C) 


where  7 = ||/z||  is  a concentration  parameter,  the  unit  vector  rj  = /z/7  is  the  mean 
direction,  T(-)  is  the  standard  gamma  function  and  <fi  is  the  standard  normal  density 
function.  For  convenience  we  will  call  n = 777  the  mean  vector  of  the  distribution. 

The  density  (2.1)  is  unimodal  and  rotationally  symmetric  about  77,  and  is  the 
form  of  the  projected  normal  distribution  most  studied  in  the  literature.  Next  we 
rewrite  this  density  in  a way  that  will  be  useful  for  the  derivations  in  Section  3.2.  Let 
be  the  kth  repeated  integral  of  the  standard  normal  distribution  function  4>,  i.e. 


/0+°°  rd  1(f)  (r  — ,yu'ri)  dr 
U f (|)  2 d/2~1ei2/2(f)  (- yu'r /) 


(2.1) 
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with  $_!  ( x ) = (j)  ( x ) (note  that  4>o  (x)  = $ (x)).  In  Appendix  A we  show  that 


1 f+oc 

(x)  = — / (2  — x)dz,  k = 0, 1, . . 

Jo 


(2.2) 


and  we  also  derive  the  recursive  formula 


/c$fc  (x)  = $fc_2  (z)  + x$fc_x  (x) , A:  = 0, 1, ... , 


(2.3) 


which  we  use  in  our  computer  programs  for  computing  the  maximum  likelihood 
estimate  of  r].  Using  (2.2)  in  (2.1),  we  obtain 

(d  - l)!$d_i  (7 u'rj) 


fu  (u)  r ^ 2d/2_le72/2(£  M, ■ 

For  the  derivations  in  Section  3.2  it  is  convenient  to  rewrite  (2.4)  as 

(<*-!)! 


(2.4) 


fu  (tt)  = r Si  2!/2-i  exP  {^-1  (T V)  - j J , 


(2.5) 


r (I)  2^- 

where  ipk  ( t ) = log  ($*  (t)  /0  (i)). 

2.2  The  Mean  Resultant  Length  of  the  Projected  Normal  Distribution 

The  formulas  for  the  MRLs  of  the  projected  normal  and  von  Mises-Fisher 
distributions  involve  Bessel  and  Kummer  functions.  The  modified  Bessel  function 
/„  (2)  of  the  first  kind  and  order  v is  defined  as  one  of  the  solutions  of  the  differential 
equation 

o d uj  duj  / o o\  n 
+ + = °' 

There  are  many  different  representation  of  /„  (2).  For  example,  it  can  be  written  as 


the  series 

or  as  the  integral 

h (z)  = 


(I)’ £ 

k=0 


(*)' 


k\T(v  + jfc  + 1)’ 


(i  r 


7T1/2r  (v  + i) 


i)  £ (i  - tT 


e2tdt,  for  1/  > — . 

2 
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The  Kuramer  function  M (a,  b,  z),  also  called  the  confluent  hypergeometric 
function,  is  defined  as  one  of  the  solutions  of  the  differential  equation 


d2LU 

zd?  + (b 


\ duj 

z)  — aw  = 0. 

dz 


There  exist  various  representations  of  M (a,  b,z).  For  instance,  it  can  be  written  as 
the  series 

M(abz)  = T 

where  (a)n  — a (a  + 1)  (a  + 2) . . . (a  + n — 1) , (a)0  = 1,  and  similarly  for  (6)n,  or  as 
the  integral 


M(a,b,z)=  rl,T{^rl  , f'e-t-Hl-tf-'di 
r (6  - a)  F (a)  J0 

if  b > a > 0.  For  more  details  on  modified  Bessel  and  Kummer  functions  see 
Abramowitz  and  Stegun  (1970,  p.  374  and  p.  504). 

In  the  circular  case  ( d = 2)  the  MRL  of  the  projected  normal  distribution  as  a 
function  of  the  concentration  parameter  7 was  given  by  Kendall  (1974)  as 

P = (y ) e_c  |J0  (0  + /*  (0] . (2.6) 

where  £ = 72/4.  Reuter  (1974)  found  an  alternative  expression  for  p,  again  for  d = 2: 


P = \M1/2e  Qm  (^’2’3)  ‘ 


In  Corollary  2.2  below  we  show  that  the  two  expressions  are  equivalent.  To  the  best 
of  our  knowledge,  no  such  formula  is  known  for  d > 2.  In  the  following  theorem  we 
present  a formula  for  p in  the  general  d-dimensional  case. 
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Theorem  2.1.  Let  U = (U\, . . . , Ud )'  ~ PNd  (n,  I)  , where  d > 1 and  ^ / 0. 
Then  the  MRL  of  U is  given  by 


where  M is  the  confluent  hypergeometric  function,  ( = 72/2,  and  7 = ||/i||. 

Proof.  The  projected  normal  distribution  can  be  obtained  by  projecting  the  multivariate 
normal  distribution  onto  the  unit  sphere.  Therefore,  we  can  write  U = Y /||  Y||, 
where  Y = (Vj, . . . , Yd)'  ~ Nd  (p,,  I).  Since  p depends  on  /i  only  through  the 
concentration  7 = ||/i||  > 0,  we  assume,  without  loss  of  generality,  that  /a  = 


Note  that  —1  < Uj  < 1 V j = 1 , ,d,  and  thus  Uj  is  integrable  for  all  j = 
1, . . . , d.  Since  for  all  j — 2, ... , d , Uj  is  symmetrically  distributed  about  0, 


which  simply  means  that  p is  the  expected  value  of  the  cosine  of  the  angle  between 
U and  77. 

We  first  consider  the  case  d = 1.  Note  that  in  this  case  we  have  a univariate 
random  variable  U which  takes  only  two  values,  —1  and  1,  with  probabilities  P(Y  > 0) 
and  P(Y  < 0),  respectively.  Therefore,  the  MRL  of  U is 


(2.7) 


(7,0,-..,0)'. 


E(Uj)  = 0Vj  = 2,...,d. 


Thus,  by  the  definition  of  MRL,  we  obtain 


p = || E (U)  ||  = (E?(Ui)  + E\U2 ) + • • • + E2(Ud))1/2  = | E (Ux) 


p = E (U)  = 1 P(Y  > 0)  + (-1  )P(Y  < 0)  = [1  - $ (-7)]  - $ (-7) 
= $ (7)  ~ [1  - $ (7)]  = 2$  (7)  - 1- 
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Now,  notice  that  for  d = 1,  formula  (2.7)  simplifies  to 

p=r(i  + i)  i i f) 

P r(i  + l)  V 2 y V2  + 2’2+  ’ 2 ) 


(1+1) 
r(l)7e-72//2 
r (!)  2V2 


Mil,-,— 

’2’  2 


76-72/2  M (l  - — 

2-%i/22i/2iW  V ’ 2’  2 


2-V2 


According  to  Abramowitz  and  Stegun  (1970),  formula  26.2.37, 


M 111  2’  2)  70(7) 


$ (7)  - g 


Thus,  (2.8)  becomes 


P = 


7e 


-7V2 


2-i/2^-l/2 


70  (7) 


$ (7)  - g 


(2.8) 


2-i/27ri/2(27r)-1/2e-72/2  ($(7')  2)  L 

which  is  exactly  the  MRL  of  U. 

Consider  now  the  case  d > 2.  Next  we  find  the  density  function  and  the  expected 


value  of  U\.  To  simplify  notation,  for  the  rest  of  the  proof  we  will  drop  the  subscript 
of  U\  and  write  just  U.  Notice  that  for  the  first  element  of  the  random  vector  U we 


have 

U = h 7 

(y.2 + eU yf) ' (v‘ ' X],‘ 

where  Y ~ N (7, 1)  is  independent  of  X ~ Xd-\  = F (^r,  |).  To  derive  the  density 
of  U,  we  first  make  a change  of  variables  to  obtain  the  joint  density  of  U and  R = 
(A  + F2)T  and  then  integrate  out  R to  obtain  the  marginal  density  of  U. 
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The  joint  density  of  X and  Y is 


fxAx,V ) = fx(x)fY(y) 

x(d-D/2-ie-x/2  x 


r(^i)  21/27ri/21' 

= Cix(d-3)/2e-^-2^+I)/2lK+><R  (x,  y ) , 


(y-7)2/2iK+xM(^,y) 


where  Cj  = 


2d/27ri/2r(^i) 


ppr.  Now,  let 


u 


(x  + y2)1/2  ’ 

= (x  + y2)1/2,  r > 0 


-1  < u < 1 


Then 


y — ru 


x = (r2-  y2)  = (r2  - rV)  = r2(l  - u2), 


and  the  Jacobian  of  this  transformation  is 


J = det 


V 


dx 

dx 

\ 

dr 

du 

dy 

dy 

J 

dr 

du 

= det 


2r(l  — u 2)  —2  ur2 


V 


u 


= 2r2(l  — u2)  + 2u2r2  = 2r2. 
Therefore,  the  joint  density  of  U and  R is 


fuAu,r)  = fx,Y{x,y)\J\ 

= cAd-3)/2e-{y2-2^+x)/2l&+xU  (x,  y ) | J\ 

= a [r2{  1 - u2)](d"3)/2e-[r2u2-27ru+r2(1-“2)]/22r2l(_1,1)xR+  (u,r) 
= ^(1  - «2)(d-3)/2e-(''2-27™)/2l(_1>1)xR+  («,r) , 


where  c-i  = 2ci  = e 72/2  [2^d  2)/27r1/2T  (^2^)]  *■ 
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For  the  marginal  density  of  U,  we  obtain 

r+oo 

fu(u)  = / fu,R(u,  r)dr 
Jo 

r+oo 

= / C2rd-\  1 - u2)(<i-3)/2e-(^-27ru)/2i(_i)i)  (u)  dr 

Jo 

= c2(l  - u2)^-3^72"2/2!^!,!)  (ti)  [+  rd-1e-(r-7u)2/2dr  (2'9) 

Jo 

= C2(  1 - w2)(d-3)/2e7^V2  (27r)V2  (d  _ i)!^  (7tl)  l(_lfl)  (U) 

= c3(l  - (7u)  !(_!,!)  (u) , 


where 

c3  = (2tt)1/2  (d  - l)!ca  = (2tt)1/2  (d  - 1)!- 


(d-  1)  \e~T*/2 


' 2(rf-2)/27rl/2r  (<tl)  2<d-3)/2r  (^i) 

and  <Fjfc  is  the  Arth  repeated  integral  of  the  standard  normal  distribution  function  <f>. 
For  <!>*;  we  used  the  expression 

r+oo 


\ /*+oc 

<hfc  {%)  = — J zk4>  (z  - x)  dz, 


from  Appendix  A,  where  0 is  the  standard  normal  density  function. 

Next  we  find  the  expected  value  of  U. 

E ( U ) = J ufu  ( u ) du  = c3  J u(  1 — u2)(d_3)/2e72"2/2$d_1  (7 u)  du. 

To  the  best  of  our  knowledge,  there  is  no  known  closed  form  for  the  above  integral, 
so  we  go  back  to  the  integral  representation  of  $fc. 

r+oo 


/I  1 r+oo 

u(l-u2Yd-^2e^u'/2  / l0  (z  - ju)  dzdu 

1 (“  ~ !)■  Jo 


c3 


/I  r+oo 

J„  U(1-u2)i 


(d-1) 


3)/2e-y2u2/2zd-l_ 


o~(*-lu)2/2dzdu_ 


(2.10) 


(27T)1/2 

Notice  that  the  integrand  in  (2.10)  is  the  same  as  ufu,R.(u,r).  Since  \ufu,R{u,r)\  < 
fu,R(u,r)  and  Ju,r{u,  r),  being  a probability  density  function,  is  integrable,  by 
Fubini’s  theorem  we  can  interchange  the  order  of  integration  in  (2.10).  Thus,  we 
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obtain 

E{U)  = 


c3 


(d-  1)!(2tt)1/2 
Let  t — 7 z.  We  need  to  evaluate  the  integral 

A = J u(l-u2)(d-3)/2etudu 
(1  - u2f~l)/2 


r+oc  r i 

J zd-le-z2/2  J u{\-u2)(d-3)/2elzududz.  (2.11) 


= 0 + 


d — 1 
t 


+ 


u=— 1 


(l-u2) 


2\(d-l)/2 


d-1 


te  du 


d/ 2 1/2etudu. 


According  to  Abramowitz  and  Stegun  (1970),  formula  9.6.18, 

/> 


d/2_1/2  e‘“du  = --2-r  .^  + 2)  Id_ 


i\d/2 


(I) 


Therefore,  we  have 


g, 

where  Ik  is  a modified  Bessel  function  of  the  first  kind  and  order  A;.  Substituting  A 
in  (2.11),  we  obtain 


E(U)  = 


c3 


(d-1)!  (2tt) 


r+ 00 

T / Z*-1 

2 do 

^3  r 

2~'  do 


+°V" ^ 1>k  7"(72) 

2(d-3)/2p  C3  /-+oo 


(d  — l)!"/d/2 

r+00 

= c4  / zdl2e~z2l2Id  (7 z)  dz, 

do  2 


e-z*l22dl2-lnll2T 
d'2e-z2/2h  (7z)  dz 


(72) 


d/2-1 


dz 


2(d-3)/2F  C3  2(d-3)/2r  (d=l)  (d  _ l)!e-72/2  e-A/2 

(d  - l)!7d/2-1  “ (d  - l)!7d/2-l2(d-3)/2p  (<kzi)  ~ y/2-1  ' 


where 
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According  to  Abramowitz  and  Stegun  (1970),  equation  9.6.3 


U (yz)  = e~™niJd 
2 2 


where  denotes  a Bessel  function  of  the  first  kind  and  order  k.  Thus, 

r+o o 

E ( U ) = c4J  zd/2e-z2/2e-^niJi  [y ze^  dz 

/•+OO  9 

= c4e~*wl  J z2zW2+i)-i  dz. 

Now,  by  Abramowitz  and  Stegun  (1970),  formula  11.4.28, 

+oo 


/ 


-(l/V2)  z2  (d/ 2+1) 


dz 


r (HI)  + Hf  + i)) 


d/2 


( 


\ d/2+1 

U 

r (I  + I)  7d/2e3” 


2 (75)”  r (§  + 1) 


-M 


\ 


1 /'A  1 (d  , 

2 ( 2 ) + 2 ( 2 + 1 


+ 


4(^)  2^ 


. d Id  y‘ 

r(|  + l)21/2  mU  + 2’2  + 1’^ 


Therefore, 


F fU)  e‘lV2e_3"r  (I  + 1)  ,,  /<*  ,1  d J 

E(u)~ — y/2-.r  (I  + 1)  21/2 — M 1 ; + 


r(i  + » 


2 2 2 ’2 
did  2 


r (|  + 1)  V 2/  ~ V2  ‘ 2’ 2 

and  for  the  MRL  we  obtain 

p=|£(u)I=fStSc1/VCm(-'+^+1'c 

(Another  commonly  used  notation  for  the  confluent  hypergeometric  function  is  \F\.) 

□ 


Remark  2.1.  Note  that  p depends  on  the  mean  vector  /r  only  through  7 = ||/x||. 
Remark  2.2.  In  the  proof  of  Theorem  2.1,  it  is  shown  that  p is  the  expected 
length  of  the  projection  of  U onto  its  mean  direction  rj. 
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In  the  next  three  corollaries  we  show  how  this  formula  can  be  written  in  terms 
of  other  functions  in  the  special  cases  d — 1,  2, 3. 

Corollary  2.1.  Let  U ~ PNi  (//,  1).  Then  the  MRL  ofU  is 


p = 2$  (7)  - 1, 


where  7 = |/r|. 

Proof.  See  the  proof  of  Theorem  2.1. 


□ 


Corollary  2.2.  Let  U ~ PN2  (/z,/).  Then  the  MRL  ofU  is 


1/2 

p = l 7T  ) e_C  [^0  (C)  + h (C)] , 


f) 


(2.12) 


where  Iq  and  I\  denote  modified  Bessel  functions  of  the  first  kind  of  orders  0 and  1, 
respectively,  £ = 72/4  and  7 = ||/z||. 


Proof.  By  Theorem  2.1,  for  d = 2 we  have 


r(|  + iWn 


"2Nl/2 y2 12  A J f ^ 12  , 72 

T(l  + l)  \ 2 J € (2  + 2’2  + 1,  2 

r (I) 

r (2)  2*/2 

2-V/2  _^,2 /o  . , / 3 _ 72 \ 7 r1/2  _2 


7e"’’/2M  (5. 2.y) 


2 1/2 


-"ye 


2/2 M f-,2, 

\2  ’ 2 


7Tvi  2/0  . , I 3 „ 72 


7 e"7  /2M  - 2 

23/2 7 l 2’  ’ 2 


(2.13) 
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According  to  Abramowitz  and  Stegun  (1970),  formula  13.2.2, 


Ml  14 


r (2)  21-2e(1/2^2/2)  f1 


j e,  -(l/2j<7=/2i!(1  + j)2- 3/2-l(j  _ ()3/2-l 


r (2  - !)  r 

2-!e72/4 
7rl/22-l7rl/2 
e72/4  /•! 


dt 


7T 

4/4  r ri 

7T 


J e^2t/\l  + i)“1/2(l  - t)_1/,2(l  - 441  - t)1/2^ 

J1  e-72‘/4(!  _ *2)-l/2(1  _ 

f1  e-^\l  - t2)"1/2^  - J1  e~^4{l  - 1 2)-l'2tdt  . 

Next  we  evaluate  the  integrals 

Ax  = J e^2t/\l  - t2)-V2dt  and  A2  = J*  e~^4(l  - t2)~x/2tdt. 
By  Abramowitz  and  Stegun  (1970),  equation  9.6.18, 

Ax  = J'  e-^l\\-t2)-xl2dt  = 7T1/2r  Q)  I0  (^0  = 7 xx/\l/2h 

Integrating  by  parts,  for  A2  we  obtain 

M = j e~i2t/4(l-t 2)~x'2tdt 


= 7r/n 


= -e-^/4(l  - t2)1'2 

,,2  /■! 


0 


u=— i 4 

-^yV^a-*2)1-1/2^. 


f f\^2t/4{1-t2y/2dt 


Again,  by  Abramowitz  and  Stegun  (1970),  formula  9.6.18,  we  find 

a2  = 4 ['  e-7a(/4(1  _ = .V^rti  + l)  /V\ 

4 i-i  4(p?)  V4/ 
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Therefore,  after  substituting  Ai  and  A2  in  (2.14), 
-.2  \ o72/ 4 r 


M[?  2X)=  — 

1 2’  ’ 2 J 7T 


= e7^4 


+ - 


Using  this  expression  for  M in  (2.13),  we  find 
t r1/2 


23/2 


7e  '^/2e72/4 


'0 


□ 


Corollary  2.3.  Let  U ~ PAr3(^, /).  Then  the  MRL  ofU  is 

/2\ 1//2  e-72/2  / l \ 

P=(z)  — +(1-3)  (2^(7)- 1), 


7 


7 

where  7 = ||/xj| . 

Proof.  By  Theorem  2.1,  for  d = 3 we  obtain 


r(|  + |)  /72\1/2  .72/2,^3  13  , 72 

P T(§  + 1)  ( 2 J 6 V2  + 2’2  + 1,  2 

2V2r(|) r v 2 ’2  y 

= I 7e-72/2M  (2  ~ — 

21/2(1)  ^1/2 76  \ ’ 2 ’ 2 

Now,  by  Abramowitz  and  Stegun  (1970),  equation  13.4.8, 


(2.15) 


(2.16) 


(2.17) 


Next  we  find  the  derivative  of  M (l,  |,z)  with  respect  to  2.  First,  by  Abramowitz 
and  Stegun  (1970),  formula  26.2.37,  we  have  that 


3 x 5 


\ 1/2 


4>  (x) 


for  x > 0.  Let  2 = x2/2.  Then  x = (2 z)1^2  and 


M I !,  -,2 


3 \ : (r}1/2 


e2  4> 


,/2)  - 5)  • 
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for  2 > 0.  Therefore, 


7T\l/2 


S (f) 


= 7T 


1/2 


1/2  _ \ezz-l/2 


e~z 


+©1/24K<-),/2)4 


ef-'H) 

* (<22>'/2)  - 5 

i) 

*(<2*>1/2H 

1/2 


(2tt) 


1/2 


e_,2-1/2z-1/2 


1 

+ 2? 


Now  we  substitute  in  (2.17)  and  find 
5 \ 3 


n4*H{© 

For  z = 72/2  this  is 


1/2  / 1 

e2  1 

2z 


* ((22>1/2)  - 5 


+ ^} 


(2tt) 


1/2 


p72/2 


1 - 


7“ 


$(7)_  + 


7 


Substituting  the  above  in  (2.16),  we  obtain 


P = 


'ye 


2V2  (|)  ttI/2 

2 ~\ 1/22  e-72/2 
7 


(2tt) 


1/2 


pi/2 


7 


^7 


7 


+ 1- - (2$(7)-l). 


□ 


2.3  Comparison  with  the  Fisher  Distribution 

The  most  commonly  used  distribution  in  statistical  modeling  of  directional  data 
is  the  von  Mises-Fisher  distribution,  denoted  by  Md  (77,  k),  where  77  is  the  mean 
direction  and  k > 0 is  a concentration  parameter.  The  density  of  Md  (77,  k)  with 
respect  to  the  uniform  distribution  on  S d_1  is  (see  Mardia  and  Jupp,  2000,  p.  168) 

/«W/2-i  KT)’U 

f(u ) = 111 I 

/()  r(i)/,_l(K)’ 
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where  /„  is  a modified  Bessel  function  of  the  first  kind  and  order  v.  This  distribution  is 
usually  called  the  von  Mises  distribution  for  circular  data  and  the  Fisher  distribution 
for  spherical  data.  The  von  Mises-Fisher  distribution  is  unimodal  and  rotationally 
symmetric  about  r/.  Its  MRL  as  a function  of  the  concentration  parameter  n is  (see 
Mardia  and  Jupp,  2000,  p.  169) 


which  specializes  to 

Pf  = coth/t  — — , (2-18) 

Av 

when  d = 3.  This  distribution  can  be  derived  as  the  conditional  distribution  of 
a normal  random  vector  Y with  mean  kt)  and  identity  covariance  matrix,  given 
that  ||Y||  = 1.  Thus  the  von  Mises-Fisher  distribution  arises  as  the  conditional 
distribution  of  a normal  random  vector  given  that  the  vector  lies  on  the  unit  sphere, 
whereas  the  projected  normal  arises  as  the  distribution  of  its  projection  onto  the 
sphere. 

Next  we  compare  the  densities  and  MRLs  of  the  projected  normal  distribution 
and  von  Mises-Fisher  distribution  in  the  three-dimensional  case.  First  notice  that, 
besides  unit  vectors,  directions  in  three  dimensions  can  also  be  represented  by  the 
spherical  polar  coordinates  (0,<p)  € [0, 7r]  x [0,  27t).  Therefore,  instead  of  specifying 
the  distribution  of  the  unit  vector  with  respect  to  the  uniform  measure  on  the  sphere, 
we  can  specify  the  distribution  of  (6,  <p)  with  respect  to  me  x mv,  where  mg  and 
are  Lebesgue  measures  on  [0,7r]  and  [0,  27t),  respectively.  For  both  the  projected 
normal  distribution  and  Fisher  distribution,  if  we  choose  the  mean  direction  77  as  the 
pole,  then  9 and  p will  become  independent  (see  Watson,  1983).  The  distribution  of 
<p  is  uniform  on  [0, 2n)  in  both  cases.  Watson  (1983)  shows  that  the  density  of  9 for 
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the  Fisher  distribution  is 


fF(0)  = 


K 


K COS  0 


sin#. 


2sinh/c 

For  the  projected  normal  distribution,  we  derive  the  density  of  # by  making  the  change 
of  variables  u = cos  9 in  the  density  of  the  projection  of  U onto  its  mean  direction 
given  in  (2.9).  We  also  use  (2.3)  to  obtain 


fP  (0)  = 


sin# 


r / . on  ^ (7  cos#)  „ , 

[l  + (7  cos  #)2]  — — 4-  7 cos  # J>  . 


(27r)1/2e'i'2/2  ' w ’ J 4>  (7  cos  #) 

From  the  graphical  comparisons  of  fF  (#)  and  fp  (#)  on  Figure  2,  we  see  that 
the  two  densities  match  very  well  for  the  same  values  of  p.  Therefore,  any  Fisher 
distribution  can  be  closely  approximated  by  a projected  normal  distribution  with 
S = I with  the  same  mean  direction  and  mean  resultant  length.  Notice  that  Watson 
(1983,  p.  115)  used  approximations  to  compare  the  two  densities. 

In  Figure  2 we  also  plot  pP  and  pF,  given  in  (2.15)  and  (2.18).  As  expected, 
both  pP  and  pF  are  increasing  functions  of  7 and  k,  respectively.  We  also  see  that  for 
equal  values  of  7 and  k the  mean  resultant  length  of  the  projected  normal  distribution 
exceeds  that  of  the  Fisher  distribution. 

2.4  Conditional  Distribution  of  the  Radial  Part  Given  the  Direction 

We  end  this  chapter  with  a derivation  of  the  conditional  density  of  R given  U, 
where  R — ||y||.  We  will  need  this  result  several  times  later  on.  First  we  find  the 
joint  density  of  R and  U.  Let  rn  represent  Lebesgue  measure  on  (0,  +00)  and  A the 
uniform  measure  on  1 . Since  Y ~ N (fi.  I),  the  joint  density  of  R and  U with 
respect  to  m x A is  (see  Jennrich  and  Port,  1988) 


fit. 


u 


{r' u)  = («)  ■ 


(2.19) 
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Figure  2-1:  Left:  the  projected  normal  density  (solid  curve)  and  the  Fisher  density 
(dashed  curve)  for  p=0. 25, 0.5, 0.75, 0.9, 0.95.  Right:  Mean  resultant  lengths  of  the 
projected  normal  distribution  (solid  curve)  and  Fisher  distribution  (dashed  curve). 


Now,  using  (2.19)  and  (2.5),  we  find  that 
Jrju  (r,  u) 


fn\u  (r|u)  = 


fu{u) 


= )j  exp  - V'd-i  (/•*'«)  - y + (d-  l)logr| . 


(2.20) 


Note  that  the  conditional  density  of  R given  U depends  on  U only  through  the  cosine 
of  the  angle  between  U and  r).  that  is  on  U'r],  or  equivalently  on  U' fi.  Also  note 
that  the  density  in  (2.20)  belongs  to  an  exponential  family  with  canonical  parameter 
/Fit,  and  therefore  E(R\u)  = ipd-i  (p'u)- 


CHAPTER  3 

SPML  REGRESSION  MODEL 


3.1  The  Model 


Let  (xi,  wj) , . . • , (xn,  un)  be  independent  observations,  where  x,  is  a vector  of 
covariates  and  ut  is  the  corresponding  directional  response,  with  mean  direction 
rji  and  mean  resultant  length  pl  (possibly  both  depending  on  xt).  The  SPML 
model  treats  the  uts  as  projections  onto  the  unit  sphere  of  unobserved  response 
vectors  arising  in  a multivariate  linear  model.  Specifically  the  model  assumes  the 
representation 


where  the  random  vectors  Y , ~ Nd  (/xi5  E)  with  = B'xj,  and  B = ((31, . . . , f3d ) is 
a matrix  of  regression  coefficients.  The  components  of  /r,  are  thus  = x'if3J , j = 
1 ,d,  and  we  generally  assume  that  the  first  element  of  each  x,  is  1,  to  allow  for  an 
’’intercept”  term.  Under  the  SPML  model,  JJ i has  a projected  normal  distribution. 
To  fit  this  model  we  need  to  estimate  the  parameters  B and  E,  but  they  are  not 
identifiable  without  further  constraints,  since  different  values  for  B and  E could 
yield  the  same  distribution  for  U \ on  the  sphere.  To  see  this,  consider  the  random 
vector  Y*  = cYi,  where  c is  a positive  constant.  Then  Y*  ~ Nd  (fi*,  E”),  where 
fj,*  = cB'xj  = B*  Xi,  B*  = cB,  and  E*  = c2E.  But  the  projections  of  Y , and  Y*  on 
Sd~l  have  the  same  distribution  since 


Presnell  et  al.  (1998)  suggest  that  this  difficulty  could  be  addressed  by  restricting  the 
determinant  of  E to  be  equal  to  1,  but  they  take  a simpler  approach  assuming  that 


M II  cYi 


Yi  cY  i 
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S = I.  It  turns  out  that  this  version  of  the  SPML  model  provides  sufficient  generality 
for  many  applications.  Therefore,  we  also  assume  X = I.  With  this  assumption  (and 
suppressing  the  dependence  on  i),  the  density  of  U , is  given  by  (2.5).  Presnell  et  al. 
(1998)  focus  on  the  two-dimensional  case  of  circular  data.  They  present  likelihood 
calculations,  prove  that  the  log-likelihood  function  is  concave,  and  show  how  the 
maximum  likelihood  estimate  (MLE)  of  the  matrix  of  regression  parameters  B can 
be  computed  using  the  Newton-Raphson  and  EM  algorithms.  In  the  next  section  we 
provide  a generalization  of  these  result  to  an  arbitrary  dimension. 

Throughout  this  work,  we  assume  that  the  dimension  d could  be  any  positive 
integer.  Since  the  SPML  model  is  intended  to  model  directional  data  in  dimensions 
two  and  higher,  one  might  wonder  what  happens  with  this  model  when  d = 1.  In  this 
case,  we  have  a random  variable  Y ~ AT  (/x,  1),  /x  = /3'x,  and  we  let  U = y/|T|.  The 
random  variable  U takes  only  two  values,  -1  and  1,  with  corresponding  probabilities 
P (Y  < 0)  and  P (Y  >0).  Therefore, 


where  Z ~ N (0, 1).  Notice  that  this  is  exactly  the  probit  model  for  a binary  response. 
An  alternative  way  to  derive  the  distribution  of  U is  to  use  equation  (2.4)  for  the 
density  of  U with  respect  to  the  uniform  distribution  on  the  sphere  S°  {—1,1}. 


P (u  = 1)  = P (Y  > 0)  = P (Y  - n > -/x) 


= P (Z  > — /x)  = (1  — 4>  (— /x)) 
= $(M)  = <I>(/3,x), 


Notice  that  in  this  case  the  uniform  distribution  is  defined  as  the  distribution  that 


puts  mass  1/2  on  each  of  the  points  —1  and  1.  Thus, 


P(U  = 1)=  fu(u)d\  = fu(l)X(l) 


$ (7*7)  1 

1 *v2  rj 


= 4>  (777)  = $ (/x)  = 4>  (/3'x) . 
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Analogously, 

P (U  = —1)  = 1 — $(//)  = 1 — $ {/3'x) . 

Therefore,  it  turns  out  that  the  SPML  model  is  the  natural  generalization  of  the  well 
known  probit  model  to  two,  three  and  higher  dimensions. 

3.2  Computation  of  Maximum  Likelihood  Estimates 
Under  the  version  of  the  SPML  model  described  above,  using  formula  (2.5)  we 
obtain  that  the  log-likelihood  function  is 

n 1 n 

Z(B)  = 5Z  - 9 V-iPi  + constant> 

i=  1 Z i=  1 

where,  Hi  = B'x,.  To  find  the  MLEs,  we  derive  the  score  function  below. 

Let  <8>  and  vec  represent  the  usual  Kronecker  product  and  the  “vec”  operator, 
respectively.  The  vec  operator  transforms  a matrix  into  a vector  by  stacking  the 
columns  of  the  matrix  one  under  the  other.  We  will  use  the  following  well  known 
results,  which  can  be  found,  e.g.,  in  Magnus  and  Neudecker  (1988,  equations  (2.2.4), 
(2.2.7),  (2.4.3),  and  Theorem  2.2)  and  Searle  (1982): 

(A  0 B)(C  <g>  D)  = (AC)  0 (BD),  (3.1) 

(A®B)'  = A'®B',  (3.2) 

vec(ab')  = b ® a,  (3.3) 

and 

vec(ABC)  = (C'  ® A)vecB,  (3.4) 

where  A,  B.  C and  D represent  matrices,  and  a and  b represent  vectors,  assuming  of 
course  that  the  arguments  of  the  various  matrix  operations  are  conformable. 

The  contribution  to  the  log-likelihood  of  the  ith  observation  is 

h{ B)  = -(-  constant 
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The  differential  (see  Magnus  and  Neudecker,  1988)  of  Zj  is 


d h = - ^(d/x')/^  - ^'(d/x*) 

= i’d-\(v'i,ui)(&v'i)ui  - (d/x-)/x{ 

= (d/x'){^d-l(M'Mi)Ui  - /ij 

= *i(dB){^,i_i(/i,it*i)tti  - /x<} 

Since  d/,  is  a scalar,  we  have 


d/j  = vec  (d/i)  = vec  x'(dB ){V»d_1(/i-ui)ui  - /xj 


/xj'fgix'  vec(dB), 


where  the  last  equality  follows  from  (3.4).  It  then  follows  from  the  First  Identification 
Theorem  (Magnus  and  Neudecker,  1988,  Theorem  5.11,  p.  96)  that  the  row  vector  of 
partial  derivatives  is 

._  9li  _ ( dZj  dlj  dlj  dlj  \ 

<9(vecB)'  V dPi,i  ’ ’ " ’ d(3hp  ’ ‘ ’ d/3dti  ’ " ' ’ d/3d,p) 

= {lpd-l(tJl'iUi)Ui  - /xj'  ® 


In  the  notation  of  Presnell  et  al.  (1998),  the  contribution  of  the  ith  observation  to 
the  score  function  is 
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where  the  second  to  last  equality  follows  from  (3.3).  From  this  it  follows  that  the 
score  function  is 

n 

/'(B)  = £ 4(B) 

i=  1 
n 

= ^ vec  { (v'jUiWi  ~ Xix[ B} 

i=  1 

n 

= vec ^{xiipd-iin'iUiju'i  - x&i B} 

i=l 

/•  n n n 

= vec ) x<xiB  f 

^ i= 1 i=l  J 

= vecjX'MU  -X'XB}, 

where  X = (aq, . . . , x„)',  U = (tq, . . . , un)'  and  M = diagj^i^ui), . . . , 
'0d_i(/Lt(litn)}.  The  likelihood  equations  can  be  conveniently  written  in  matrix  form 
as 

X'XB  = X'MU.  (3.5) 

Unfortunately,  there  is  no  closed  form  solution,  but  the  MLE  can  be  obtained 
using  iterative  methods,  such  as  the  Newton-Raphson  algorithm.  To  implement  the 
algorithm  we  now  derive  the  Hessian  of  the  log-likelihood. 

To  obtain  the  Hessian,  we  consider  the  second  differential  of  the  log-likelihood, 

d2k  = (d [{ipd-iiii'iUju'i  -//'}< 8)  ®'])vec  (dB) 

= [{i’d-i(fJ-,lui){dn'i)ulu,l  - d/i-}  ® ®']vec  (dB) 

= {[{^'i){i’d-Mui)uiu'i  -Id}]  ®*;)vec(dB) 
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Let  A,  = 'ipd-i(f1iui)ulu'l  — \d  (note  that  A,  is  symmetric).  Then 


d2/j  = [{x'(dB)Aj}  8 x'jvec  (dB) 

= [{£c'(dB)Aj}'<g)£Cj],vec(dB) 

= ^vec[xj{x(  (clB)A,}])'vec  (dB) 

= [vec{(xix')(dB)Aj}]/vec  (dB) 

= [{A'  0 (xjx')}vec  ((dB))]'vec  (dB) 
= (vec  (dB))'{  Aj  8 (xjx')}vec  (dB) . 


by  (3.2) 
by  (3.3) 


by  (3.4) 

symmetry  of  xlx\ 


From  the  Second  Identification  Theorem  (Magnus  and  Neudecker,  1988,  Theorem  6.1.3) 
and  the  symmetry  of  A,  8 (xjx'),  it  follows  that  the  Hessian  of  Zj  is  the  dp  x dp  matrix 

li{ B)  = A i®  (xjx(). 


It  is  convenient  to  express  /j  as  a quadratic  form: 
Zj(B)  = Aj  (8)  (xjx() 


— IdAj  (8)  (xjX^) 

= (Id  <g>  Xj)(A,  8 x') 

by  (3.1) 

= (Id  <8  Xj)(AjId  <8  lx') 

= (Id  (8>  Xi)(Ai  <8)  l)(Id  8 x') 

by  (3.1) 

= (Id  8 Xj)Aj(Id  8 Xi )' 

by  (3.2) 

Thus  the  Hessian  of  the  full  log-likelihood  is 

n n 

1(B)  = ^ Aj  8)  (xjx')  = ^(Id  8 Xi)Ai(Id  ® Xj )'. 

i=  1 j=l 

In  Appendix  B we  prove  that  —/(B)  is  positive  definite  as  long  as  X is  of  full  column 
rank,  assuring  uniqueness  of  the  MLE  of  B . 
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The  Newton-Raphson  algorithm  proceeds  as  follows.  Given  starting  values  IT0) 
for  the  parameters  B,  the  current  estimate  of  vec  (B)  is  found  as 

vec  (B(fc))  = vec  (b^)  + —1  (b(/c-1})  ^ l , k = 1,2,..., 


and  this  process  is  repeated  until  convergence. 

It  is  also  natural  to  fit  the  model  using  the  EM  algorithm  described  in  Dempster 
et  al.  (1977),  treating  the  “unobserved”  Ri  = ||Yj||,  i = 1, . . . ,n,  as  missing  data.  The 
complete  data  model  is  then  the  multivariate  regression  model  for  Y = (Yj, . . . , Yn)', 
for  which  a sufficient  statistic  is  X'Y  = X'RU,  where  R = diag(7t’1, . . . , Rn) 
is  the  diagonal  matrix  of  unobserved  lengths.  For  the  E-step,  the  conditional 
expectation  of  the  sufficient  statistic  given  the  observed  data,  U,  is  E(X'Y|U)  = 
X'E(R|U)U.  Since  the  observations  are  independent,  E ( Rt j U ) = E(Rl\ul).  For 
a single  observation,  the  conditional  density  of  R given  u is  that  in  (2.20).  Notice 
that  this  density  belongs  to  an  exponential  family  with  canonical  parameter  ii'u. 
Therefore,  E(R\u)  = ^(/Yu),  so  that  E{X'Y\U)  = X'MU. 

The  M-step  is  especially  simple,  since  the  MLE  of  B given  the  complete  data 
is  just  (X'X)_1X'Y.  Thus,  given  a starting  value,  B^°),  say,  B(0)  = (0),  the  EM 
algorithm  proceeds  by  calculating 


M(fc)  = M(B(fc))  and  B(H1)  = (X'XJ^X'M^’U,  k = 0,1,..., 


until  B converges.  Note  that  this  amounts  to  solving  (3.5)  by  simple  iteration, 
treating  B as  a fixed  point  of  the  function  5(B)  = (X'X)_1X/M(B)U  (see  Thisted, 
1996,  p.  161). 


CHAPTER  4 

MULTI-SAMPLE  TESTS  OF  HYPOTHESES 
4.1  Model  Specification 
Let 

Uu,...,Ulni  ~ P Arrf(p1, 1) 


Uk i,  ■ • • , Uknk  ~ PNd(nk,  I) 

be  k independent  random  samples  with  unknown  mean  vectors  fi1, . . . , nk.  In  the  one- 
way layout  setting,  we  may  write  = B'x,,  where  B is  a k x d matrix  of  regression 
coefficients  and  aq  is  a k x 1 covariate  vector  playing  the  role  of  a population  indicator. 
We  treat  the  first  population  as  the  reference  and,  thus,  we  take  X\  to  be  the  vector 
with  first  element  equal  to  1 and  the  remaining  elements  equal  to  0,  and  x,  the  vector 
with  first  and  ith  elements  1 and  the  remaining  elements  0,  i = 2, . . . , k.  The  design 
matrix  X then  takes  the  form 

X — (®1  ,...,2Ji,...,35fc,...,35fc)  . 

' v ' ' v ^ 

ni  nk 

Suppose  that  we  are  interested  in  testing  the  hypotheses 


H0  : Hi  = ■ ■ ■ = /ht 

Ha  : /i;  / fj,j , for  at  least  one  pair  (i,  j). 


(4.1) 


This  can  be  accomplished  through  the  asymptotic  likelihood  ratio  test  (LRT)  using 
the  methods  for  likelihood  maximization  described  in  the  previous  chapter.  This  test, 
however,  simultaneously  tests  for  equality  of  mean  directions  as  well  as  equality  of 
MRLs.  To  see  this,  note  that  the  null  hypothesis  is  equivalent  to  rjl  = ■ • • = r]k 
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and  71  = • • ■ = 7fc,  where  r]i  = /x^/ ||/Xj ||  and  7 * = H/x^ ||  are  the  mean  direction  and 
the  concentration  parameter  of  the  ith  group,  respectively,  i = 1, ...  ,k.  In  many 
applications  with  directional  data  the  hypothesis  of  interest  is  the  equality  of  mean 
directions  itself.  In  the  next  section  we  develop  a test  for  equal  mean  directions  under 
the  assumption  of  equal  concentrations. 

4.2  A Test  for  Equal  Mean  Directions  Assuming  Same  Concentration 

In  this  section  we  present  the  asymptotic  likelihood  ratio  test  procedure  for 
testing  whether  two  or  more  projected  normal  populations  have  identical  mean 
directions  assuming  equal  unknown  concentrations.  This  is  a directional  version  of 
the  classical  one-way  ANOVA  setup  in  which  population  variances  are  assumed  equal. 
Because  of  the  way  the  mean  direction  and  the  mean  vector  of  the  projected  normal 
distribution  are  related,  this  test  is  equivalent  to  the  test  for  equality  of  mean  vectors 
assuming  that  they  have  the  same  lengths. 

Thus,  we  are  interested  in  testing  the  hypotheses 


H0  : »h  = ■ • • = Tfc 

Ha  : Vi  7^  V ji  for  at  least  one  pair(i,  j ), 
assuming  71  = • • • = 7 As  mentioned  above,  this  is  equivalent  to  testing 


(4.2) 


H0  : Mi  = — = Mfc 

Ha  : /q  7^  Hj,  for  at  least  one  pair 


(4.3) 


under  the  assumption  71  = • • • = 7*,.  To  test  (4.2)  or  (4.3)  we  can  use  the  asymptotic 
LRT  based  on  the  test  statistic 


Di  = 2 


Z(Bc)-Z(BHo) 


where  Z(BC)  is  the  maximized  log-likelihood  of  the  model  under  the  assumption  71  = 
• • • — 7 k and  /(Bh0)  is  the  maximized  log-likelihood  under  the  null  hypothesis.  We 
will  refer  to  this  test  as  Test  1.  Let  introduce  the  notation  (3i  = (/9,o, . . . , (l%(k-i))'  for 
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i = 1, . . . , d.  Notice  that,  under  H0,  = 0,  i = 1, . . . , d;  j = 1, . . . , k— 1.  Therefore  to 

get  the  maximum  of  /(B)  under  H0  we  can  use  the  procedures  described  in  Section  3.2 
with 


and 


B — (Pio,  ■ ■ ■ , Pdo ) > 


where  n = n\  H (-n/t-  Next  we  show  how  the  maximum  of  /(B)  under  7i  = • • • = 7fe 

can  be  obtained  using  the  method  of  Lagrange  multipliers. 

First  we  write  condition  71  = • • • = 7*,  as  the  system  of  equations 

M2  = M2 


M2  = IM2, 

or  equivalently 


MfcMfc  - Mi  Mi  = 0. 

Now  define 

k- 1 

Lc  (B;  0)  = Z (B)  + ^(Mi+iMi+i  - M1M1), 

i=l 

where  9 = (9\, , 6k~\)'  denote  Lagrange  multipliers.  In  Appendix  C we  have  shown 
that  the  derivative  of  LC(B.  9)  with  respect  to  vec(B)  is 


Lc  (B;  9)  = vec  (X'MU  - X'XB  + 2CB) , 
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where  C is  a k x k matrix  with  first  row,  first  column  and  main  diagonal  the  vector 
(0, 9\, . . . , dk-\)' , and  Os  elsewhere,  i.e. , 


( 0 0i 

0i  0i 


0fc— 2 9k- 1 ^ 

0 0 


c = 


0fc- 2 0 ...  0/c— 2 0 

y 0fc-i  0 ...  0 0fc_i  J 


In  Appendix  C we  have  also  derived  the  derivative  of  Lc  (B;  6)  with 
(Ci  (B) , . . . , Cfc_i  (B))'  where  C{  (B)  = /z'+1//i+1  - Mi/R-  Therefore, 
the  score  function  is 

( 4(B;0)  N 
Ci(B) 

s(B-e)=  ; 

V Cfc_ i (B)  y 

The  score  equations  S'  (B;  0)  = 0 can  be  solved  using  the  Newton- Raphson  algorithm. 
To  implement  the  algorithm,  again  in  Appendix  C,  we  have  derived  the  Hessian  as 


respect  to  6 as 
we  obtain  that 


H (B;  0) 


[(B)  + C (B;  6)  W'  ^ 

V W 0 


where  C (B;  6)  = 21^  (g>  C is  a dk  x dk  matrix,  W is  a (k  — 1)  x dk  matrix  defined 
in  Appendix  C,  and  0 is  a (k  — 1)  x (k  — 1)  matrix  of  zeros.  Having  found  D i, 
we  test  (4.2)  by  referring  D\  to  a chi-square  distribution  with  degrees  of  freedom 
dfi  = (d  — l)(k  — 1)  and  we  reject  H0  for  large  values  of  D\.  The  degrees  of  freedom 
are  determined  as  the  difference  in  the  dimensions  of  the  parameter  spaces  under 
the  alternative  and  null  hypotheses.  More  precisely,  for  k groups  in  Wl,  there  are  d 
free  parameters  under  the  null  hypothesis,  and  kd  — (k  — 1)  under  the  alternative. 
Therefore,  for  the  difference  we  have  kd  — (k  — 1)  — d = (d  — l)(/c  — 1). 
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We  have  carried  out  a simulation  study  on  the  size  of  Test  1 for  the  case  of 
two  groups  in  three  dimensions,  the  results  of  which  are  presented  in  Section  4.6. 
Because  the  test  is  anti-conservative,  we  decided  to  look  for  a distribution  that  better 
approximates  the  distribution  of  L)\  under  H0.  Meanwhile,  we  also  compared  Test  1 to 
other  tests  for  equal  mean  directions  assuming  a common  concentration;  namely,  the 
asymptotic  likelihood  ratio  test  assuming  an  underlying  von  Mises-Fisher  distribution 
(see  Mardia  and  Jupp,  2000,  p.  224)  (we  will  refer  to  it  as  the  LRMF  test;  MF  stands 
for  von  Mises-Fisher),  the  High-Concentration  F test  (see  Mardia  and  Jupp,  2000, 
p.  223),  the  ANOVA  test  (see  Mardia  and  Jupp,  2000,  p.  225),  and  the  nonparametric 
test  given  in  Fisher  et  al.  (1987,  p.  204).  From  simulations  on  the  size  of  these  tests, 
we  found  that  only  the  LRMF  test  works  reasonably  well.  The  other  three  tests 
performed  poorly.  Note  that  the  LRMF  test  is  easier  to  implement  than  Test  1, 
since  the  MLEs  for  the  mean  direction  and  concentration  of  the  von  Mises-Fisher 
distribution  are  available  in  closed  form. 

The  asymptotic  reference  distribution  for  the  LRMF  test  statistic  is  also 
X(d-i)(fc_i)-  However,  the  results  from  simulations  on  the  size  of  this  test  were  very 
similar  to  those  of  Test  1,  i.e.  the  chi-squared  approximation  is  again  inadequate  for 
small  to  moderate  sample  sizes.  For  all  the  reasons  mentioned  above,  we  decided  to 
look  for  a better  approximating  null  distribution  for  these  tests.  We  have  proven  that 
(see  Section  4.4)  for  high  concentrations,  the  distribution  of  (a  transformation  of) 
the  LRMF  test  statistic  is  approximately  F^-\)(d-i)\n-k)(d-\)-  The  similarity  of  the 
von  Mises-Fisher  and  projected  normal  families  demonstrated  in  Section  2.3,  suggests 
that  the  same  approximation  should  be  useful  for  Test  1 as  well.  Using  this  reference 
distribution  leads  to  better  size  performance  of  both  the  LRMF  test  and  Test  1, 
though  a rigorous  theoretical  justification  for  Test  1 still  needs  to  be  found. 
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4.3  A Test  for  Equal  Concentrations 

Since  Test  1 is  based  on  the  assumption  of  equal  concentrations,  it  is  useful  to 
have  a test  of  this  assumption.  Let  us  refer  to  the  LRT  for 

H0  : /Xj  = • • • = Hk 

Ha  : /zt  ^ fij,  for  at  least  one  pair 

with  no  constraints  on  ||/ij ||, . . . , ||/xfc||,  as  Test  3.  This  test  decomposes  into  two 
tests,  namely  Test  1 and  the  LRT  for 


Ho  : IIMiII  = •••  = ||M 

Ha  : H/ijll  7^  ||/Xj||  for  at  least  one  pair 


(4.4) 


which  we  will  refer  to  as  Test  2.  If  we  denote  by  D2  and  D3  the  LRT  statistics,  and 
by  d/2  and  d/3  the  degrees  of  freedom  for  Test  2 and  Test  3,  respectively,  then  we 
have  the  relationships 


D3  — D\  + D2 


d/3  = dfi  + d/2. 


(4.5) 


We  can  easily  compute  the  statistic  D$  using  again  the  methods  of  Section  3.2.  Also, 
note  that  d/3  = d(k  — 1).  Then,  from  (4.5),  we  find  D2  = D3  — Di  and  we  test  (4.4) 
by  referring  D2  to  a chi-squared  distribution  with  d/2  = d/3  — d/i  = k — 1 degrees  of 
freedom.  We  reject  (4.4)  for  large  values  of  D2. 

4.4  Approximating  the  Distribution  of  the  LRT  Statistics  for  the  von 
Mises-Fisher  Distribution  with  High  Concentration 

Suppose  that 


Un,...,Ulni  ~ Md(r/1, k) 


Ukl,...,Uknk  ~ Md(r]k,  k) 
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are  k independent  random  samples  of  sizes  rii, . . . , nk  from  von  Mises-Fisher 
distributions  with  mean  directions  r]1,. . . ,r]k,  respectively,  and  a common  concentration 
k.  We  wish  to  test  the  hypotheses 


(4.6) 


H0  : Vi  = ■ • • = Vk 
Ha  : Vi  7^  V j>  for  least  one  pair  ( i,j ). 

Let  U = (tin,  ■ • • , Uknk)'  denote  the  matrix  of  observed  directions  and  n — n\+-  ■ -An*, 
the  total  sample  size.  The  likelihood  function  is 


k rii 


Lmf  (U;  t/j,  . . . , Tjfc,  k)  = H / (uiU  uini ; r}i7  k)  = JJ  JJ  / (iiy;  Vi,  «) 


i=i 


i=i  j— i 


k rii 


= n n (f ) 2 ( r ( 2 ) /i-1  m ) exp 

i=l  j 

-(f) 


'«\n(5-1)  f (d' 

r I -!/-_!(«))  exp 


nk 


«kE  U\j  4 1-  v k Uki 

. \ 3=1  3=1  / 


Let 


Ri  = 


n, 


E 

3=1 


, for  i — 1, . . . , k, 


and  define 


Also  let 


R = n 1 'y^njRj. 


R = 


i=  1 


k rii 


1 ____  _ * 

;EE 

*=i  j=i 


and 


Id  (k) 

Ad  (k)  = , d > 1. 


Ji- l(«)’ 

Under  H0,  Vi  ~ ~ Vk  = V,  where  r)  is  unknown  mean  direction.  Under  H0,  the 

maximum  likelihood  estimates  of  rj  and  k are  (see  Mardia  and  Jupp,  2000) 


Vo 


lv-fc  y-n, 
n 2^i=l 

~ R 
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and 

K0  = V {R)  • (4.7) 

Under  HQ,  the  maximum  likelihood  estimates  of  r]1 , . . . ,r]k  and  k are  (see  Mardia  and 
Jupp,  2000) 


Va 


JL  X^ni  u 

rij  2^3=1  “'J 

Ri 


, i — 1,  . . . , k, 


and 


ka  = Ad1  (fi)  . 


The  likelihood  ratio  test  statistics  for  testing  (4.6)  is 

Lmf  (U;t70,  kq) 


LRfm  — 


RmF  (U.  ^7al>  • • • > Raki  ^a) 
Ko\n(^_1)  / ^4-i  (^°) 


X 


(k<j) y 

exp  (k0*7o  E"=i  «tf) 


exp  (ka  (r)'al  uU  + ■ ' ■ + Vak  E"=i  «fcj)  ) 


- (d— 2)/2 


rvwN 


K, 


(d- 2)/2 


7(d— 2)/2  (^a) 


exp  (nKoT?) 


exp 


= 


.(d  2)/2  j /(d_2)/2(K0) 

exp(K0Ad(K0)) 

ka~2)/2  J /(d- 2)/2(«a) 

exp(KaAd(Ka)) 

(4.8) 


(4.9) 


Note  that  .4^  is  a strictly  increasing  function  mapping  [0,  oo)  onto  [0, 1)  Schou 
(1978). 

By  formula  9.7.1  of  Abramowitz  and  Stegun  (1970), 


7(d— 2)/2(«) 


7TK 


x _ (d  — l)(rf  — 3)  + (d  - l)(d  — 3)(d2  — 4d  — 5)  + 0(k_3) 


8k  128k2 

as  k — > oo.  Using  this,  Schou  (1978)  determines  that 


(4.10) 


2k 


(4.11) 
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Let  y = y(/t)  = 1 — Ad(n),  or  equivalently,  k = Adl{\  — y ).  Then  y j 0 as  k — > oo, 
and  by  (4.11), 


V = 


d-1  (rf—  l)(d  — 3) 


+ 0(0- 


2k  8k2 

From  this  it  is  clear  that  k_1  = 0(y)  as  y J,  0,  and  more  precisely  that 
1 2 y (d-  3) 


k d — 1 4k2 

2y  , (d  - 3)  f 2y 


+ 

+ 

+ 

+ 


+ 0(«T3) 


d — 1 4 [d-  1 

2y  , (d  - 3)y2 


4-  0(k~2)  > +0(k"3) 


(4.12) 


rf-1  (d  — l)2 

2y  , (rf  - 3)y2 


+ 0(y«  2)  + 0(k  3) 

+ 0(y3)  as  y | 0. 


d-1  (d-1)2 

As  we  will  see,  these  expansions  are  actually  more  accurate  than  needed  here,  though 
the  additional  terms  might  be  useful  in  a more  refined  analysis. 

Watson  (1956)  shows  that  under  H0, 


2 nn 


d ) 

fw\ 

W\ 

/i  — Rj 

\w2j 

as  k — > oo, 


(4.13) 


where  W\  ~ X(n-*;)(d-i)  and  W2  ~ x'fk- 1 )(d-i)  are  independent.  By  the  continuous 
mapping  theorem,  it  follows  also  that 


2«n(l  - R)  — > ITi  + W2  ~ x2„_ !)(*£_!)  as  k 


oo. 


(4.14) 


These  results  imply  in  particular  that  1 — R = ()p(k  j)  and  1 — R — 0p(k  j)  as 
k — > oo.  Thus,  alternatively  taking  y = 1 — R and  y = 1 — R in  (4.12),  it  follows  that 


(4.15) 


and 


1 2(1  - R) 


+ Op(n  ) 


d — 1 


(4.16) 
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Returning  to  the  likelihood  ratio,  note  that  (4.10)  and  (4.11)  together  imply  that 
K{d- 2)/2 

7 -r^-exp(/tAd(«))  = K(d_1)/2  v^expj-Kfl  - j4rf(«)] } [l  + 0(k-1)  1 

Rd-2)/2(«) 

= «;(d_1)/2  v^2?r  exp|  — ■ + 0(k_1)|  [l  + 0(k-1)]  1 

= nl'-MVtoe-W'*  (1  + 0(0) 


as  k — > oo.  From  this  and  (4.15)  we  see  that 

fXd- 2)/2 

O rrTexp(kOJ4<i(«:o)) 

Rd-2)/2(«o) 

= + ^»-"'-1)/2  (1 + 0P(K-')), 

and  similarly  from  (4.16), 

£<.«*- 2)/2 

O — tt-t  exp  (/c0Ad(/ca)) 

Rd-2)/2VKa) 

= + Op(«'2)}~ld  l>/2  V2ie-{d~1)/2  (1  + 0,(K-')). 


Substituting  these  results  into  (4.9),  and  applying  (4.13)  and  (4.14),  it  follows  by 
Slutsky’s  theorem  and  several  applications  of  the  continuous  mapping  theorem  that 


LRfm  — 


2(1  — R)/ (rf  — 1)  + Op(n~2) 
2(1  — R)/(d  — 1)  + Op(k~2) 


2n«(l  — R)  + Op(k  *) 
R)  + Op(k_1) 

n(d— 1)/2 


IF,  + W2 


(d-l)/2 


(4.17) 


Similar  to  the  way  the  standard  ANOVA  F test  statistic  is  constructed  (see 
Scheffe,  1961,  p.  36),  we  construct  the  test  statistic 


F = 


(n  — k ) 

W^) 


(tv? 

\^nMF 
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From  (4.17)  it  follows,  again  by  the  continuous  mapping  theorem,  that 
„ i W2/\(k-l)(d- 1)] 

F Wy[(n  -k)(d-  1)]  ~ <»-*>«-.)• 

Note  that  this  is  the  same  asymptotic  distribution  as  for  the  High-Concentration 
F test  given  in  Mardia  and  Jupp  (2000,  p .223). 

4.5  An  Example 

Anthropological  researchers  have  long  been  interested  in  the  study  of  the 
evolution  of  the  relationship  between  form  and  function.  In  particular,  they  have 
been  interested  in  exploring  the  relationship  between  skeletal  form  and  type  of 
locomotion.  One  aspect  of  this  research  has  been  the  study  on  the  orientation 
of  the  last  lumbar  vertebra  posterior  elements  in  orthograde  primates  with  diverse 
locomotion  behavior.  In  this  example  we  consider  the  orientation  of  the  last  lumbar 
superior  facets.  Specifically,  we  consider  the  orientation  of  the  major  axis  of  the 
facet  as  it  describes  the  direction  in  which  the  facet  allows  the  greatest  range  of 
motion.  Figure  4-1  displays  a drawing  of  a human  lumbar  vertebra,  with  the  right 
superior  facet  labeled  as  the  superior  articular  process.  Tables  D-l,  D-2,  and  D 
3 in  Appendix  D give  the  directions  of  the  major  axis  of  the  last  lumbar  vertebra 
left  superior  facet  in  samples  of  three  species  of  primates:  Humans,  Gorillas,  and 
Chimpanzees.  The  data  are  plotted  in  Figures  4-2,  4-3,  and  4-4,  using  an  equal- 
area  projection  on  the  plane.  These  data  were  collected  at  the  American  Museum  of 
Natural  History,  Departments  of  Mammalogy  and  Anthropology  by  Dorion  Keifer, 
while  working  on  her  master’s  thesis  as  a graduate  student  at  the  Department  of 
Anthropology  at  the  University  of  Florida.  Details  of  her  study  can  be  found  in 
Keifer  (2005). 

Descriptive  statistics,  such  as  mean  direction  and  mean  resultant  length,  are 
presented  in  Table  4-1.  The  mean  resultant  lengths  of  the  three  groups  indicate 
moderate  concentrations  of  the  data.  Assuming  underlying  projected  normal 


40 


pr*x<*$ 


Sttf*  fior  nfiitniar 

pma* 


Trm»4ttr«t 


J nfcner 

f>rocu* 


Figure  4-1:  A human  vertebra. 


Figure  4-2:  Humans’  major  axis  superior  facet  directions  plotted  using  an  equal-area 
projection. 

Table  4-1:  Descriptive  statistics  and  p- values  from  Test  1 and  Test  2.  MD=mean 
direction;  MRL=mean  resultant  length;  Conc=concentration  for  the  projected 
normal  distribution. 


Species 

Mean  Direction 

MRL 

Human  (19) 

(-0.05,0.77,0.63) 

0.84 

Gorilla  (16) 

(-0.18,0.59,0.79) 

0.82 

Chimpanzee  (18) 

(-0.14,0.43,0.89) 

0.84 

p- value 

0.0002 

0.7256 
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Figure  4-3:  Gorillas’  major  axis  superior  facet  directions  plotted  using  an  equal-area 
projection. 


Figure  4-4:  Chimpanzees’  major  axis  superior  facet  directions  plotted  using  an  equal- 
area  projection  (excludes  observation  number  6 since  it  lies  on  the  lower  hemisphere). 
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Table  4-2:  Pairwise  comparisons:  p- values  from  Test  1 and  Test  2. 


Species 

p-value  1 

p-value  2 

Human 

Gorilla 

0.1711 

0.7777 

Human 

Chimpanzee 

0.0047 

0.9527 

Gorilla 

Chimpanzee 

0.2212 

0.7375 

distribution  for  the  data,  we  performed  tests  for  equal  concentrations  and  equal 
mean  directions  of  the  three  groups.  P-values  less  than  0.05  were  considered  to 
be  statistically  significant.  First  we  tested  for  a common  concentration  (Test  2) 
and  failed  to  reject  the  null  hypothesis  (p-value=0.7256).  Next  we  tested  equal 
mean  directions  assuming  the  same  concentration  for  the  groups  (Test  1).  The  null 
hypothesis  was  rejected  (p-value=0.0002),  and  we  concluded  that  at  least  two  of 
the  groups  have  different  mean  directions.  To  further  investigate  the  differences, 
we  conducted  pairwise  tests  for  equal  mean  directions.  The  p-values  from  these 
tests  are  presented  in  Table  4-2.  We  found  no  statistically  significant  differences 
between  the  mean  directions  of  the  pairs  Humans  and  Gorillas,  and  Gorillas  and 
Chimpanzees.  However,  the  mean  directions  of  Humans  and  Chimpanzees  were  found 
to  be  statistically  different. 

4.6  Simulation  Studies 

We  have  carried  out  a simulation  study  for  the  size  of  the  tests  described  in 
Section  4.2  and  Section  4.3  in  the  case  of  two  groups  in  three  dimensions.  For  each 
test  we  used  5000  random  realizations  of  the  test  statistic  for  each  combination  of 
the  sample  sizes 

(10, 10),  (10,  20),  (20, 20),  (20,  40),  (40, 40) 

and  values  of  p 


0.1,0.25,0.5,0.75,0.95,0.99. 
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For  simulation  purposes,  it  is  useful  to  note  that  these  tests  are  invariant  to  rotation 
of  the  sphere.  This  means  that  for  a given  test,  regardless  of  how  the  sphere  is  rotated 
around  its  center,  the  distribution  of  the  test  statistic  will  be  unchanged.  Because 
of  this  invariance  property,  under  the  null  hypothesis,  there  is  no  loss  of  generality  if 
we  simulate  from  PA/3  (/x,  I),  where  /x  = (7,0,0)  and  7 is  determined  from  formula 
(2.15).  To  generate  an  observation  from  PA/3  (/x,  I),  we  simply  generate  a AT3  (/x,  I) 
random  vector,  Y,  and  calculate  its  projection  onto  the  sphere,  U = Y /||Yj|. 

Plots  of  the  ordered  p-values  (y-axis)  versus  the  uniform  quantiles  (x-axis)  of 
Test  1,  Test  2 and  Test  3 are  presented  in  Appendix  E.  They  should  be  interpreted 
in  following  way:  for  a pair  of  numbers  the  number  on  the  y-axis  is  the  nominal  level 
of  the  test  and  the  number  on  the  x-axis  is  the  actual  size  of  the  test.  Therefore, 
if  the  test  statistic  is  referred  to  its  exact  distribution  then  the  numbers  in  the  pair 
would  be  very  close,  and  the  line  obtained  by  plotting  the  pairs  would  be  very  close 
to  the  45  degree  line. 

The  plots  for  Test  1 from  all  the  simulations  show  that  the  actual  size  of  the  test 
always  exceeds  the  nominal,  that  is,  Test  1 is  anti-conservative.  The  performance 
of  the  test  improves  as  we  increase  the  concentration,  and  as  should  be  expected, 
when  we  increase  the  total  sample  size  too.  To  improve  the  approximation  to  the 
null  distribution  of  the  test  statistic,  we  have  also  used  the  F reference  distribution 
discussed  in  Section  4.4.  Specifically,  instead  of  referring  the  test  statistic  D 1 to  the 
x\  distribution,  we  refer  (n-2)(LRp]^n ~ 1)  to  the  P2, 2(n-2)  distribution,  where  LRPN 
denotes  the  likelihood  ratio  statistic  for  Test  1 related  to  Ph  by  Pi  = -2  log  (LRPN), 
and  n is  the  total  sample  size.  In  general  for  k groups  in  d dimensions,  we  suggest 
referring  [(n  - k)/(k  - l)](LP“^/["(d_1)1  - 1)  to  the  F(fc-i)(«i-i)I(n-fc)(«i-i)  distribution. 
Plots  of  the  ordered  p-values  versus  the  uniform  quantiles  of  Test  1 with  the  new 
reference  distribution  are  also  displayed  in  Appendix  E.  We  see  that  the  new  reference 
distribution  approximates  much  better  the  true  null  distribution  of  the  test  statistic. 
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Even  for  a small  sample  size,  such  as  n\  = n2  = 10,  this  approximation  works  very 
well  as  long  as  p > 0.5,  which  is  considerably  smaller  than  the  sample  MRLs  we  have 
seen  in  applications.  For  really  small  concentrations,  p < 0.25,  this  approximation 
is  better  than  the  x|,  but  may  still  be  inadequate.  The  actual  size  of  the  test  for  a 
nominal  size  of  0.05  is  presented  in  Appendix  E,  in  the  rows  corresponding  to  0 in 
Tables  E-l  through  E-5. 

The  plots  from  the  simulations  on  the  size  of  Test  2,  show  that  the  test  is  anti- 
conservative for  values  of  p bigger  than  0.5  and  conservative  for  values  of  p less 
than  0.5.  To  improve  the  approximation  when  p > 0.5,  which  is  the  case  of  most 
applications,  we  decided  to  use  an  F distribution  again.  We  refer  the  test  statistic 
D2  to  the  Fii3(n_2)-„,  and  in  the  general  case  of  k groups  in  d we  suggest  using  the 
( k — distribution,  where  by  cF(jud2  we  denote  the  distribution  of  the 

random  variable  X = cY,  where  Y ~ Fdltd2 • The  idea  of  this  approximation  is  based 
on  the  fact  that  the  mFmj  distribution  has  a heavier  tail  than  the  Xm  distribution,  and 
W V as  l — > +oo,  where  W ~ rnFmj  and  V ~ Xm-  By  choosing  different  values 
for  the  denominator’s  degrees  of  freedom,  we  can  adjust  the  new  reference  distribution 
to  fit  better  to  the  actual  distribution  of  the  test  statistic.  The  way  we  determine  the 
denominator  degrees  of  freedom  here  is  analogous  to  the  way  the  degrees  of  freedom 
for  the  ANOVA  test  are  determined.  This  new  reference  distribution  approximates 
better  the  true  null  distribution  of  D2  for  p > 0.5,  but  approximates  it  worse  when 
p < 0.5.  A better  correction  is  yet  to  be  found. 

The  simulations  for  Test  3 show  that  the  test  is  always  anti-conservative.  Using 
the  same  argument  as  for  Test  2,  we  suggest  using  the  3F;!.3(n^2)-n  instead  of  xl  85 
the  reference  distribution.  In  the  general  case  of  k groups  in  d dimensions  we  would 
refer  the  statistic  D3  to  the  d(k  — 1 )Fd(k-i),d(n-k)-n  distribution.  This  approximation 
to  the  null  distribution  of  the  test  statistic  works  almost  perfectly  for  all  sample  sizes 
and  mean  resultant  length  combinations. 


45 


To  investigate  the  robustness  of  the  three  tests  to  changing  the  underlying 
distribution  of  the  data,  we  also  performed  simulations  with  data  from  Fisher 
distribution.  To  simulate  from  a Fisher  distribution  we  used  the  procedure  described 
by  Fisher  et  al.  (1987,  p.  59).  The  results  from  the  simulations  were  very  similar 
to  those  with  data  from  the  projected  normal  distribution.  Therefore,  we  concluded 
that  the  tests  can  be  used  with  data  from  the  Fisher  distribution  as  well. 

We  have  also  carried  out  simulations  for  the  size  of  the  LRMF  test.  Plots  of 
the  ordered  p-values  (y-axis)  versus  the  uniform  quantiles  (x-axis)  are  presented  in 
Appendix  E.  Similar  to  Test  1,  the  LRMF  test  is  also  always  anti-conservative. 
Using  the  result  for  the  asymptotic  distribution  of  the  transformed  LRMF  test 
statistics  from  Section  4.4,  we  corrected  the  test  by  changing  the  reference  distribution 
from  a xl  to  an  ^2,2(n-2)-  From  the  plots  we  see  that  this  approximation  to  the 
null  distribution  of  the  test  statistic  works  much  better.  Similar  to  Test  1,  the 
approximation  gives  very  good  results  for  p > 0.5.  Simulations  for  this  test  were  also 
done  with  data  coming  from  projected  normal  distribution.  We  found  no  differences 
in  the  results,  meaning  that  the  test  can  be  used  with  projected  normal  data  too. 
The  actual  size  of  the  test  for  a nominal  size  of  0.05  can  be  found  in  Appendix  E,  in 
the  rows  corresponding  to  0 in  Tables  E-l  through  E-5. 

Besides  the  LRMF  test,  in  the  literature  there  are  also  other  tests  for  testing  equal 
mean  directions  assuming  a common  concentration.  We  carried  out  simulations  to 
investigate  the  size  of  three  such  tests,  namely,  the  High-Concentration  F test  (see 
Mardia  and  Jupp,  2000,  p.  223),  the  ANOVA  test  (see  Mardia  and  Jupp,  2000,  p.  225), 
and  the  nonparametric  test  given  in  Fisher  et  al.  (1987,  p.  204).  Our  simulations 
showed  that  the  approximation  to  the  null  distribution  of  the  test  statistics  of  these 
three  tests  works  poorly,  and  thus  leads  to  inaccurate  results.  From  our  observations 
only  the  LRMF  test  is  performing  reasonably  well,  and,  in  fact,  the  results  from  the 
simulations  on  its  size  are  very  similar  to  those  of  Test  1 . 
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We  also  investigated  the  performance  of  the  corrected  versions  of  Test  1 and  the 
LRMF  test  in  terms  of  power.  We  again  considered  the  case  of  two  groups  in  three 
dimensions.  For  each  of  the  combinations  of  sample  sizes  and  values  of  p listed  above, 
we  simulated  5000  pairs  of  samples  for  each  of  the  following  eight  angular  separations 
in  degreed  of  the  true  mean  vectors:  5,  15,  20,  25,  30,  35,  40.  The  samples  for  Test  1 
and  the  LRMF  test  were  drawn  from  the  projected  normal  distribution  and  Fisher 
distribution,  respectively.  We  calculated  the  test  statistics  and  the  proportion  of 
times  they  exceeded  the  0.05  upper  quantile  of  the  F2i2(n-2)  distribution.  The  results 
are  summarized  in  Appendix  E in  Tables  E-l  through  E-5.  As  should  be  expected, 
the  power  performance  of  both  tests  improves  as  we  increase  the  sample  size  and  the 
concentration.  The  results  also  show  that  for  sample  sizes  of  at  least  n\  = «2  = 20 
the  tests  work  very  well  for  values  of  p > 0.75.  Also,  note  that  the  two  tests  have  a 
very  similar  power.  For  instance,  for  sample  sizes  n\  — ri2  — 20  and  p = 0.95,  if  the 
true  mean  directions  are  separated  15  degrees  then  the  null  hypothesis  of  equal  mean 
directions  is  rejected  with  probability  about  .8968  for  Test  1 and  0.8906  for  LRMF 


test. 


CHAPTER  5 

ONE-WAY  RANDOM  EFFECTS  MODEL 

5.1  Model  Specification 

Consider  the  model 


Yij  fJ,  -f-  CXi  T 
OLi  ~ Nd  (0,  a2J) 

E-ij  ~ Nd  (0,  I) 

OLi,  Sij  -independent 
i = 1,  ■ ■ • , k]  j = 1, . . . ,71*, 


where  /x  and  a2  are  unknown  constants.  This  model  treats  the  directional 
observations  U ij  as  projections  onto  the  unit  sphere  of  unobserved  response  vectors 
arising  from  a multivariate  one-way  random  effects  model.  Since  the  model  specifies 
that  U a, . . . ,Uini  are  independent  from  Uji, ... , U jn  , i ^ j,  the  likelihood  function 
of  the  model  can  be  written  as 

k 

/( U)  = /(tin,...,  u^)  = Ylf(un,...,uini), 

i=  1 

where  / (un, . . . , umi)  is  the  joint  density  function  of  Un, . . . , U iUi. 

It  is  natural  to  calculate  the  likelihood  of  Un,. . . ,Uini  by  first  specifying 
it  in  terms  of  the  conditional  distribution  given  the  random  effects,  and  then 
integrating  out  the  random  effects.  Since  Y n, . . . , Y ini\oLi  ~ Nd  (/x  + a*,  I),  we  have 
Un, . . . ,U in^oii  ~ PNd  (f-i  + q,,  I),  and  the  conditional  density  function  of  UtJ  given 
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OLi  IS 


f(Uij\(*i)  = 


( d-  l)!4>d-i  {u'ij  (/I  + oti)) 


(5.1) 


T (|)  22-1e5^+“‘)V+ai)0  ^ ^ + a.)) ' 

Using  the  independence  of  Un, . . . , Uini  given  a,  and  dropping  the  subscript  i,  we 
write  the  joint  density  function  of  Un, . . . , U \ni  as 


f(ui,...,un)  = / f (u1,...,un\a)  f (a)da  = 

J J 


II  /(«»!«) 


i=l 


f {a)  da 


Substituting  (5.1),  the  expression 

<M*)  1 


1 f°° 

= — / zke~^+ztdz,  k>  1, 
7o 


and  the  formula  for  the  normal  density  function  of  a,  we  have 


/ ( n.  i , . . . , un ) 

-/ 


]I  f r f - ) 2f-1e2(,x+Q)'(p+Q) 


U=i 


)T^* 


z?+zjUr(n+a)d 


e 


-Vq'q 


(2ttct2) 


da. 


Hoping  to  obtain  a more  useful  form  of  the  likelihood,  we  exchange  the  d- 
dimensional  integral  with  the  n-dimensional  integral. 

f(ui,...,un ) 


iexp(-(4+?)“'a+(p)“;'n'‘' 


~iz?+zj  u’jti 


a da 


dzi . . . dzn. 
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The  integrand  of  the  integral  with  respect  to  at  can  be  completed  to  have  the  form 
of  the  normal  density: 


Note  that  the  integrand  of  the  integral  above  is  the  density  of  the  normal  distribution 


Nd 


nfji 


o'= i 


and  therefore  it  integrates  to  1 over  W1.  Thus,  the  likelihood  can  be  re-expressed  as 


x exp 


1 

2 


ZjUj  — n/i 


dz\  • • • dZfi 


(K 


2 (nal  + 1) 


Ew  \^2zjuj 


j=i 


o=i 


n n 

--Vz-t  — 2 — ta*7 

9^3  ncr2  + 1 ^ 


j=l 


dzi  . . . (izrl 


7 = 1 


where 


+ l)M^ 


ci  = 


T -1 
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and 


C2 


rirhH 


(rial  + l) 2 exp 


n 


2 (nal  + 1) 


mV 


-i 


Let  z = (zi,...,Zn)'  and  U = (iti, . . . ,un)'.  Then  Yl]=  izjuj  = U'z  and  we  can 
rewrite 


In  this  notation 


f[z^j  e-^z-ayA~1(z~a)dz.  (5.2) 

Interestingly,  the  n-dimensional  integral  in  (5.2)  turns  out  to  be  a multivariate 
generalization  of  the  repeated  integral  of  the  standard  normal  distribution  function 
defined  in  Appendix  A.  For  the  multivariate  version  of  the  repeated  integral  we  have 
the  formula 

(fh 

\i=i 

where  <j>n(-,  £)  is  the  density  of  Nn(0,  £),  the  n-dimensional  normal  distribution  with 
mean  0 and  covariance  matrix  S.  Using  this,  the  joint  density  of  the  U j s can  be 
rewritten 

f(u1,...,un)=c  (det  (A))*  (a,  A) , (5.3) 


^ (f>n(z  - S)  dz, 


/(Ml,...,  un)  = c2e5t 


"7 

Jr* 


where 


c = (27r)^  ((d  — !)!)” 


n 


2 {nal  + 1) 


mV 
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Unfortunately  there  does  not  seem  to  be  any  simple  multivariate  analogue  of  the 
iterative  formula  for  computing  1)  from  ^(x,  1)  and  1),  that  is 

formula  A. 4.  Thus,  it  seems  that  it  would  be  very  hard  to  find  MLEs  by  directly 
maximizing  the  likelihood.  It  would  be  even  harder  to  do  so  for  models  with  more 
complicated  random  effects  structure.  In  fact  in  the  one-dimensional  case,  when  our 
model  reduces  to  the  well  known  probit  model,  it  has  been  noted  that  the  likelihoods 
of  such  models  pose  serious  computational  problems  and  techniques  such  as  Monte 
Carlo  methods  have  been  used  to  fit  the  model  (see  McCulloch  and  Searle,  2000).  In 
the  next  sections  we  discuss  four  ways  for  computing  estimates  for  fi  and  a 
5.2  Maximum  Likelihood  Estimation 

In  this  section  we  consider  three  methods  for  finding  maximum  likelihood 
estimates  for  the  parameters  [i  and  tr2:  direct  maximization  of  the  likelihood  using 
Gauss-Hermite  quadrature,  Markov  chain  Monte  Carlo  EM  algorithm,  and  Monte 
Carlo  algorithm. 

5.2.1  Maximum  Likelihood  Estimation  Using  Gauss-Hermite 
Quadrature 

As  we  saw  in  Section  5.1,  the  likelihood  function  of  the  ith  group  of  observations 
Un, . . . , U ini  is  not  available  in  closed  form  and  it  is  hard  to  maximize  it  directly  with 
respect  to  the  unknown  parameters  fj,  and  a2.  This  problem  can  be  solved  by  first 
approximating  the  likelihood  with  Gauss-Hermite  quadrature  and  then  maximizing 
the  approximation.  Here  we  concentrate  on  the  two-dimensional  case,  i.e.  d = 2,  but 
everything  can  be  similarly  derived  for  three  and  higher  dimensions. 

Dropping  the  subscript  i,  from  Section  5.1  we  have  that  the  joint  density  of  the 
ith  group  Un, . . . , U m%  can  be  expressed  as 


f(uu...,un)  = / 
Jr 

-s 

Jr 


n/« 

j=i 


f (at)  dot 


e_2  (m+“)'(m+“) 

3= 1 


1 4-  u'j  (n  + a) 


$ (u'j  (t*  + qQ) 
</>  (u'j  (h  + a)) 


(27rcr^)  1e  *JZaadat. 
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We  make  the  change  of  variables 


_ * / * _ * y 

a = (a1,  a2)  = 


a. 


and  obtain 


f(ui,...,Un)  = - / / 3(a;,«;)e 

^ Jr1  L7ri 


e-Q2  da 


(5.4) 


where 


_ g-z^+V^01*)  (/i+V^Sa*) 


j=l 


1 + u'j  (/*  + \/2 of  a*) 


$ (u'  (/*  + 

</>  («'  (y  + V^0*)). 


The  inner  integral  in  (5.4)  can  be  approximated  by  a univariate  Gauss-Hermite 
quadrature  treating  a2  as  fixed  (Monahan,  2001,  see,  e.g.  Section  10.4  of).  Let 
Xi  be  the  zth  zero,  i = 1, . . . , n,  of  the  Hermite  polynomial  of  order  n, 


Hn  (x)  = n!  ^ 


(~l)r 


(2s) 


n—2m 


— ' m\(n  — 2m)! 
m= o v ' 

where  [■]  denotes  the  integer  part.  Also,  define  the  weights 

2”-1n!-v/7r 


Wi  = 


n2  [i/n_ i (Xj)]' 


r,  i = 1,  ■ • • ,n. 


Then, 


h(a*2):=  [ g(al,al)ea?daltt'Srwig(xi,a*2). 

Jri  i=1 

We  can  use  Gauss-Hermite  quadrature  again  to  approximate  the  second  integral. 

f{u1,...,un)m-[  h (a£)  e~a^da2 
ft  Jr1 

1 " 

~ -^2wjh{xj) 

3= 1 

^ n n 

»;EE  WiW2g  (xj,  Xj) . 


t=i  j=i 
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While  it  is  relatively  easy  to  approximate  the  likelihood  of  the  single  random 
effect  model  using  Gauss-Hermite  quadrature,  it  would  be  very  hard  to  do  so  for 
models  with  more  complicated  random  effects  structure.  In  the  next  section  we 
develop  an  alternative  method  for  estimating  the  one-way  random  effects  model, 
which  will  offer  a greater  flexibility  when  building  more  complex  models. 

5.2.2  Markov  Chain  Monte  Carlo  EM  Algorithm 

In  this  section  we  show  how  to  find  MLEs  for  fi  and  o2a  using  the  EM  algorithm. 
Treating  the  unobserved  lengths  Rij  = ||>Gj||  and  random  effects  at  as  missing  data, 
the  complete  likelihood  of  the  model  is  then  the  likelihood  of 


Y 11 1 • • ■ j Y\ni , • • • , Y k i , . . . , Y kn/f  j > • • • > 


which  we  will  denote  by  fc  (Y,  a).  We  have  that 


fc  ^0  f (^~11 ) * ■ ■ j Y \ni , . . . , Y k\ , * * * , Y knk  |^1t  • • • i /*  (o^l , . - - , CKfc) 


oc 


,i=l 

k rii 


11/  («i) 


i= 1 


k rii 

mu™ 

_i=l  j- 1 


2 R ®i)  (Y ij  /J*  OLi 


i=l 


n/K) 

,i= 1 

1 , 

2 al' 


l!exP  1 _ol2a'a' 


nn-p 

_i= 1 j= 1 

/ k rii  ^ k ^ k rii 

a exp  ( /T  Y,  Y,  (yb  ~ a»)  “ ^ I]  aiai  ~ «*)'  OGi  - 


x 2 a2 

\ 2= 1 J=1  a 2=1  2=1  ji=l  / 

The  complete  data  likelihood  is  exactly  the  multivariate  normal  density  and  a 
sufficient  statistics  for  (n',cr2)  is 

t- - (f  £ (r«  - «<)',  t- <**)  - ( t £ < Wi  -«<)'•  t <*>.)  • 

\i=l  j= 1 i=l  J V i=l  j= 1 i=l  / 

Following  the  E-step  of  the  EM  algorithm  for  exponential  family  form  complete 
data  densities  described  in  Dempster  et  al.  (1977),  we  need  to  find  the  conditional 
expectation  of  T given  the  observed  data,  U = («u, . . . , ulni , . . . , uk\, . . . , Uknk)' . 
Unfortunately,  this  is  not  possible  since  it  requires  knowing  the  joint  conditional 
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density  of  Rij  and  at  given  the  observed  data  U,  which  does  not  exist  in  closed 
form.  Note  that  it  would  be  enough  if  we  knew  just  the  marginal  conditional 
distributions  of  and  at.  The  problem  of  evaluating  E (T  |U)  can  be  solved 
using  a Monte  Carlo  approximation  for  the  expectation.  Let  n = nj  + • • ■ + nfc 


R = (i?ii , ■ • • , .Rin! , • • • , Rkh  ■ ■ ■ , Rknk)'  the  n x 1 vector  of  unobserved  lengths,  and 
a = («! , . . . , afc)  the  d x k matrix  of  random  effects.  In  this  notation,  the  sufficient 
statistic  T can  be  rewritten  as 


Suppose  for  a moment  that  the  conditional  densities  / (R|U,  n,  <x^)  and  / (a |U,  /r,  cr^) 
were  available  in  closed  form  at  least  up  to  an  unknown  constant  and  were  easy  to 
simulate  from.  Let  r®  and  atl\  l = 1. . . . , m,  be  iid  realizations  from  these  densities. 
Then  a Monte  Carlo  approximation  of  E (TjU,  n,  a „)  would  be 


Equation  (5.5)  would  still  provide  a consistent  Monte  Carlo  estimate  of  E (TjU.  //,.  crj) 
if  {r^l\  a(^  } j were  a random  sequence  from  an  ergodic  Markov  chain  (see  Tierney, 
1994)  with  invariant  density  the  joint  density  of  R and  a conditional  on  U.  We  can 
use  Gibbs  Sampling  (see  Robert  and  Casella,  2004)  to  obtain  such  a sequence. 

First  notice  that  are  conditionally  independent  given  U and  R,  and 

the  conditional  distribution  of  a.i  given  U and  R depends  only  on  Ua, . . . , uirii  and 


be  the  total  sample  size,  n — (rq, . . . , rik)'  the  k x 1 vector  of  sample  sizes 


T = ((U'il  — an)' , vec  (a)'  vec  (a))  . 


(5.5) 


55 


Rii, . . . , Rini.  Therefore, 


/(aj|r,U)  = f(ai\Yn,...,Yini)  oc  / (an  Ya, . . . , Y ini) 

= f (YiU . . . ,Yini\ai)  f (oti)  = ( jj/ (Y  ijlaii,  v,  pi)  ] / (ct;|/z,  cr^) 


Kj=l 


OC 


riexP  ( YH  -P-  ai)'(Xij  - M - «i)^  exp  ^--^2 a'a, 

(rii  + a'aj  - 2 a,  | . 


Thus,  oti  given  U and  R follows  a normal  distribution 


Nd 


niaa  + 1 


£ 


o~ 


TijUij  - ri^i  I , 


WtO'a  + 1 


I 


Now  note  that  Rn, . . . , Rknk  are  conditionally  independent  given  U and  a,  and 
the  conditional  distribution  of  Rij  given  U and  ct  depends  only  on  a*  and  utJ. 
Therefore,  from  (2.20)  we  have 


/ (r,j|a,  U)  = / (rijiai,  uzj) 
1 


(d~  1)! 


exp  [-ipd- 1 ((M  + «i)'  Wjj)]  rf-  1 exp 


(n  + aj)'u 


v 


1 2 
-rfi 


(5.6) 


The  Gibbs  Sampler  procedure  alternatingly  simulates  from  the  densities  / (a|r,  U) 
and  / (i?|a,  U),  starting  with  suitably  chosen  initial  values  for  r,  say.  The  resulting 
sequence  of  (r,  a)  values  forms  a Markov  chain  with  invariant  density  the  true  joint 
density  of  R and  a conditional  on  U.  We  can  easily  simulate  from  / (a|r,  U)  since 
this  is  a normal  distribution.  It  is  not  trivial  to  simulate  from  the  density  / (i?|a,  U), 
but  we  next  show  how  this  can  be  done  using  the  Accept-Reject  algorithm. 

From  Section  2.4  we  know  that  E(Rij\al,  u%] ) = ipd-i  ((/■*  + at)'u(J).  Therefore, 
as  an  candidate  density  we  consider  the  density  of  the  normal  distribution  with  mean 
equal  to  the  mean  of  / (r^la*,  Uy)  and  variance  1,  that  is  N ( ipd-i  ((m  + ai)'  uij)  if)- 
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We  will  denote  this  density  by  g (r^ja:*,  utJ).  Dropping  the  subscripts,  we  have 


g(r\a,u) 


((d  — 1)!)  ^xpf-V’d-i 

'(p  + a )'  u)  ] rd  1 exp  [r  (p  + 

a)’  u — |r2] 

_ 1 

(27 r)  2 exp 

r — ipd-i  ((p  + a)  u) 

— (27t)2  ((d  — 1)!)  1 exp 


-i/jd-i  ({n  + a)'u)  + ~ (rpd-i  ((p  + a/u)) 


x rd  1 exp  -r  1 ((p  + ct)'  u)  - (p  + a/  u) 


oc  rd~1e~cr, 


where  c — ipd-i  ((p  + a)7  it)  — (p  + a)7  it.  Simple  calculations  show  that 


V’fc(ic)  - 


®k(x)' 


Since  the  repeated  integrals  Tfc(x)  are  positive  and  monotonically  increasing  functions 
in  x , it  follows  that  c > 0.  Therefore,  rd-1e-cr  achieves  its  maximum  with  respect 
to  r > 0 at  r = (d  — 1 )/c,  and  thus,  the  ratio  / (r|a,  it)  /g  (rja,  u)  is  bounded  from 
above  by 


M = (2tt)5  ((d  - 1)!)_1  (d  - l)d~] 


x exp 


a- 1 ((p  + «)'  u)  ~ (p  + «)'  w 
— V’a-i  ((A4  + a)' w)  + ^ (V’d-i  ((A4  + a)'it)^  - d+  1 


To  obtain  a random  realization  from  f (rtj  |a,U),  the  Accept- Reject  algorithm 
proceeds  as  follows. 

1.  Generate  random  variables  A ~ g and  U ~ U\o,i] , where  f/[o,i]  is  the  uniform 
distribution  on  the  interval  [0, 1]. 

2.  Accept  A as  a random  realization  from  (5.6)  if  U < f ( X ) /[Mg  (A)],  otherwise 
return  to  1. 

We  now  describe  the  complete  E-Step  of  the  EM  algorithm.  Let  p(p)  and  da(p) 
denote  the  estimates  of  p and  <r2  after  p cycles  of  the  algorithm.  Then 

1.  Given  initial  values  rd())  for  r,  for  t — 1, ....  N (N  large  enough)  generate 
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(a) 


a|°  ~ Nd 


E4 


riiCTa (P)  + 1 \j=l 


{t  l)Uij  - riifi (p) 


*2W 


niO-Q  p)  + 1 


i = 1, . . . , k 


(b)  rf) , given  ctf\i  = 1 .,k;j  = 1, . . . , n*,  according  to  the  Accept-Reject 


algorithm  described  above. 

Repeat  la  and  lb  to  get  approximate  draws  {r^,a^}  from  the  true  joint 
density  of  R and  a given  U. 

2.  Let  {r^,  be  the  last  m elements  of  the  sequence  produced  in  1,  and 

estimate  E ^TjU,  j^p\  da^^j  by 

(,  m , m \ 

-E(uv(,)-“<,)n)'.-E  vec  (a«)'  vec  («(0)  . 

m i=i  m i=i  ) 

For  the  M-Step  of  the  EM  algorithm  we  need  to  find  the  unconditional 
expectation  of  T. 

(k  m k \ 

i=l  j=l  i—1  ) 

(km  k \ 

E E - E («<))' . E E («!“■) ) 

i=l  j=l  i= 1 / 

(k  7~t ' k \ 

(l*  - °)'  . da°  ) = (n^>  kda a)  • 

i=  1 j=l  i=l  / 

In  the  third  equation  above  we  used  the  fact  that  -Va^a*  ~ and  therefore 
E (cx'iCti)  = cr^E  = a^d . The  new  estimates  ji[p+1'>  and  (d^)^p+1^  are 

determined  as  the  solution  of  the  equation 


E{T\ti,al)=E{T\XJ,v(p\<?l{p)). 
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After  simple  calculations  we  find  that 


(p+i) 


- m 

= — (U'r(0  — a(,)n) 

mn  ' 

1=1 


and 


- m 

al(p+1)  ^kdH^vec  (ail))'vec  (a(°) 


1=1 


5.2.3  Monte  Carlo  EM  Algorithm 

If  we  can  simulate  a,  directly  from  f (apun, . . . , uirlt),  i = 1, . . . , k,  then  on  the 
E-Step  of  the  EM  algorithm  we  can  estimate  the  expectation 


E (^"ijlU)  E (Y ij \iin , . . . , rrjni)  E (uijRij\un, . . . , u.inP) 

'U'ij  E (Ajjjua,  •••  T Him ) Eaj  [ E ( R%j  | u.,  [ , . . . , Uini:  qi,)] 

UijEai  [E  Q!i)]  UijEai  (ip  ipU'ij  ( P T 

m 

(=i 

where  is  a random  sample  of  size  m from  / (a2|  un, . . . ,uin .).  Note  that 


E (Rijluij,  an)  = ip  (uE  (/r  + aj)) 


follow  from  the  fact  that  f ( rij\uij , . p.  a^)  belongs  to  the  exponential  family  with 
canonical  parameter  itE  (p  + a,)  (see  Section  2.4).  Therefore,  the  expected  value  of 
the  sufficient  statistic  T can  be  approximated  by 

/ 1 m k rii  1m/c  \ 

„EEE M K 0* + “■ ’) ) - «.“>]  ■ E E E <*?V  • 


;=i  i=i  j=i 


(=i  i=i 


Note  that  this  approach  avoids  generating  r^.  We  next  discuss  how  to  simulate  from 
f (oti\un,  • • • , Uini)- 
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Dropping  the  subscript  i,  we  have 


f{a\uu  ...,un)  oc  f (a,  u\, . . . ,u„)  = f{uu.. . , u„|a)  / (a) 


II /(“iN 


a 


,i=i 

n 

n 


/(a)  = 


n 


$d- 1 K (m  + qQ) 
y 4 K (/*  + a)) 

Note  that 


(d-  l)!$d_i  (u'(/x  + a)) 

|L|  T (|)  25-1e>+“)'bi+aV  («'  (M  + a)) 

/ 1 / 1 \ , , \ 
exp  y~2  \^n  — 2 J a a ~ aJ  ■ 


e 2a<* 


(27T£t2) 


f[cj){u'l(fi  + a))  = f[  (2tt)  5 exp  («'  (/i  + a))2") 
i=l  i=l  L ' ' 

= (2tt)“?  exp  (u'i  (P  + a))2  j 

= (27 r)_i  exp  ^ ((u'^)2  + (™'a)2  + 2(u'/x)(u'a))  j 

= (2tt)“^  exp  (n'uju'tiji  + a'ttjit'a  + 2 n'uiU,ia)J 

-l  ^+a/  a+2/i'  (it 


= (27t)  2 exp 


, i=l 


Letting  U = (ui, . . . , un)'  as  before,  then  ^"=1  uiu'i  = U'U,  and 

" r i 

(A1  + oc))  = (27t)_2  exp  - 

i= 1 

Using  the  above,  we  obtain 


^ (/x'U'U/x  + a'U'Ua  + 2/i'U'Ua) 


/(a|iti,...,wn) 

n 

n^-i  («<(/*+“)) 

i=l 

nr=1  [*«-i  w(/i+o))] 


oc 


exp 


(ia'Q) 


exp 


exp 


lWa  , a'U'Ua  . . 
n + \ — n/x  a H h/iU  Ua 


1 ,, 


- -a'  (nld  - U'U)  a — fj,'  (nld  - U'U)  a 
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We  claim,  but  it  is  yet  to  be  shown,  that 


nUi  [Srf-i  K (M  + <*))] 

exp  (j^a'a) 

is  bounded  in  a.  After  the  bound  has  been  found,  we  could  generate  a from 
f (a\ui, . . . ,un)  using  the  Accept-Reject  algorithm  with  a candidate  density  the 
density  of  the  normal  distribution 

Nd(n,  (nlrf-U'Ur1). 


Notice  that  {nld  — U'U)  is  a positive  definite  matrix,  since 

(71  \ / 71  ^ 

nld  — ^2,  UlU'i  ) X = nx>x  ~ x>  I 'ui'ui 
t=l  / \i=l  / 

, , n2  (nx’x  Er=i(^)2V 

= «**-£>*)  = (pj?  - ||g||a  ■ j I 

= )w2>o  V 


X 


X 


t= 1 


as  long  as  the  vectors  u*  s do  not  lie  on  the  same  line  (which  happens  with  probability 
zero). 

5.3  Alternative  Estimators  of  p and  <x2 

Under  the  one-way  random  effects  model,  for  each  i = 1, ....  A:,  we  have  that 
Uij\ai  ~ PNd  (p  + oti,  I),  j = Treating  = g + Q;  as  an  unknown 

parameter,  we  can  compute  “the  MLE”  p,  of  pt  by  maximizing  the  conditional 
likelihood  function  of  U n, . . . ,U given  a,,  using  the  methods  described  in 
Section  3.2.  Since  a*  ~ Nd  (0,  cr2I),  it  follow  that  p;  ~ Nd  (/x,  ct2I),  i = 

Because  E = p,  we  can  estimate  p with  the  average  of  the  pt  s,  that  is 
p = t Ei=i  AV  Now,  let  flil  denote  the  Zth  component  of  the  vector  p(,  and  a,2 
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denote  the  sample  variance  of  the  Ith  components  of  the  p,  s,  i.e. 


where  p;  is  the  Zth  component  of  p,  that  is 


Then,  use 


In  this  example  we  fit  the  one-way  random  effects  model  to  a simulated  data 
set  in  R2,  through  the  MCMCEM  algorithm  and  also  by  directly  maximizing  the 
likelihood  using  Gauss-Hermite  quadrature.  The  data  set  consists  of  10  groups,  of 
sample  size  10  each,  simulated  from  the  model  with  p = (5, 0)  and  cr2  = 2.  Note 
that  marginally,  a single  observation  U ~ PN2  (p,  (<r2  + 1)1),  or  equivalently  U ~ 


j|p/(cr2  -|-1)||  = 5/ %/3  ~ 2.89,  which  corresponds  to  a moderate  concentration  of  the 
data  p « 0.88. 

First  we  fit  the  model  through  the  MCMCEM  algorithm.  For  the  EM  part  we 
run  500  iterations.  On  the  E-step  we  implement  a Gibbs  chain  with  3000  iterations,  of 


convergence  of  the  estimates  for  p and  cr2,  respectively.  At  the  238  iteration  of 
the  EM  algorithm,  the  norm  of  the  difference  of  the  estimates  for  p from  this  and  the 
previous  iteration  drops  below  0.0001.  The  estimates  for  p and  cr2  at  this  iteration 
are  p = (4.0034,  —0.0633)  and  d2  — 1.5942,  respectively.  To  explore  the  behavior  of 
the  estimates  when  using  more  observations  of  the  Gibbs  chain,  we  run  the  algorithm 


The  length  of  the  mean  vector  of  the  latter  distribution  is 


which  we  use  the  last  500  values  to  estimate  the  expected  value.  We  use  p((l)  = (2,  0) 
and  <7a(0^  = 1 as  starting  values  for  the  E-step.  Figures  5-1  and  5-2  show  the 
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but  now  using  the  last  2500  values  of  the  chain.  From  the  plots  in  Figures  5-3  and 
5-4,  we  see  that  the  variability  of  the  estimates  has  been  reduced. 

We  also  fit  the  model  by  first  approximating  the  likelihood  with  two  univariate 
Gauss-Hermite  quadratures  and  then  maximizing  the  approximation.  We  use 
univariate  quadratures  of  orders  2,  3,. . . , 30,  and  perform  the  maximization  using 
the  optim  function  with  Nelder  and  Mead’s  method  in  R,  with  the  same  starting 
values  as  for  the  MCMCEM  algorithm.  Note  that  using  a univariate  quadrature 
of  order  30  leads  to  an  approximation  with  30  x 30  = 900  quadrature  points  in  two 
dimensions,  and  30  x 30  x 30  = 27, 000  quadrature  points  in  three  dimensions.  We  plot 
the  corresponding  estimates  for  fj,  and  a2  in  Figures  5-5  and  5-6,  respectively.  Note 
that  the  estimate  for  the  second  component  of  //  does  not  seem  to  have  converged 
yet.  Therefore,  more  quadrature  points  are  probably  needed.  The  estimates  obtained 
from  using  two  univariate  quadratures  of  order  20  are  p = (4.0443,  —0.2728)  and 
a2  = 1.6212.  The  estimates  for  the  first  component  of  n and  a2  are  close  to  the  ones 
found  by  the  MCMCEM  algorithm.  A thorough  investigation  is  needed  to  understand 
in  detail  the  implementation  of  the  two  methods. 


63 


Figure  5-1:  The  left  and  right  plots  show  the  convergence  of  the  estimates  for  the 
first  and  second  elements  of  /z,  respectively,  using  the  MCMCEM  algorithm,  with  the 
last  500  values  of  the  Gibbs  chain. 
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Figure  5-2:  Convergence  of  the  estimate  for  using  the  MCMCEM  algorithm,  with 
the  last  500  values  of  the  Gibbs  chain. 
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Number  of  EM  Iterations 


Figure  5-3:  The  left  and  right  plots  show  the  convergence  of  the  estimates  for  the 
first  and  second  elements  of  /z,  respectively,  using  the  MCMCEM  algorithm,  with  the 
last  2500  values  of  the  Gibbs  chain. 


Figure  5-4:  Convergence  of  the  estimate  for  a2  using  the  MCMCEM  algorithm,  with 
the  last  2500  values  of  the  Gibbs  chain. 
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Figure  5-5:  The  left  and  right  figures  plot  the  estimates  for  the  first  and  second 
elements  of  //,  respectively,  found  by  directly  maximizing  the  likelihood  using  Gauss- 
Hermite  quadrature. 
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Figure  5-6:  The  estimates  for  cj\  found  by  directly  maximizing  the  likelihood  using 
Gauss-Hermite  quadrature. 


CHAPTER  6 

MIXED  EFFECTS  MODELS 
6.1  Random  Intercept  Model 

Suppose  we  have  the  model 

U-  = ^L. 

%3  \\Yv\\ 

ij  B Xij  + OL{  + 

(61) 

eij~Nd(  0,1) 

ct; , etj  -independent 

i 1)  • • • j fcj  j 1, . . . , Tii, 

where  the  p x d matrix  B = ((31, . . . . /3d)  and  o2a  are  unknown  constants,  and  the 
px  1 vectors  xt]  are  known  coefficients.  Note  that  this  model  treats  the  directions 
Utj  as  projections  onto  the  unit  sphere  of  unobserved  response  vectors  arising  from 
a single  random  effect  linear  mixed  model. 

As  in  Section  5.2.2,  we  can  use  the  missing  data  Rij  = \\YtJ ||  and  ct,  to  form  a 
complete  data  likelihood  and  then  use  the  EM  algorithm  to  find  estimates  for  B and 
G2a.  The  complete  likelihood  of  the  model,  fc  (Y,  a),  is  the  likelihood  of 

j ■ • • i Y , . . . , Y k \ , . . . , , oc  i , . . . , ot.k. 
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Proceeding  as  in  Section  5.2.2,  we  have 


/c  (Y,  a)  = / (Y n , . . . , Y ini , . . . , Yfcj , • • • , | Qc  1 » ...  5 ®fc)  f (®i  > ■ • ■ > ^fc) 


n/(^a,...,Yini|ai) 


a 


oc 


i=l 
A:  rii 


n/(- 


2=1 


fc  rij 


1 1 II/(V«I<*) 


.*=  1 j=l 


.1=1  j=l 
k rii 


[ | 1 I exp  ( --  (Yij  - B'xij  - oti)'  (Y tj  - B 'xij  - a, 


n/M 

i=l 

n/ 

exp  - 


j=i 


2 o* 


n exp  ( - 


i=i 


QtjQ=i 

2*2 


| I S I exp  --  (Y ij  - ax)'  (Y t]  - a,)  + (Y y - a 

i=i  j= l ' 

(k  rii  A k A k ni 

53  53  x6B  (y « - a«)  ~ ^ 53  ^ - o 53  53  - «•)'  c Y a - «*) 

*=i  j=i  ° *=1  t=i  j=i 

Since  x'^B  (Y,j  — a,)  is  a scalar  and  by  formula  (3.4), 

a^-B  (Y^  - = vec  (x'^B  (Y - a,))  = ((Y tj  - a i)'  ® x'^)  vec  (B) . 


Therefore 

k Hi  k rii 

53  53  x'oB  (yo  - = 53  53  ((y6  - «*)'  ® *y vec  (B) 

i= 1 j=l  i=l  j=l 

= ( 5353  (yb-  - ® ) vec  (B) , 

\»=i  j= l / 

and  thus  for  the  complete  data  likelihood  we  have 

f k rii  \ i fc 


fc  (Y,  a)  oc  exp 


53  53  - «*)'  ® vec  (B)  - ^-2  53  a'iai 

Ki=l  j= 1 / Q i=l 

fc  rii 


2=1  j = 1 

This  is  just  a multivariate  normal  density  and  a sufficient  statistic  for  the  fixed  effects 
vec  (B)'  and  variance  component.  o\  is 

t = (53  53  (y«  - «*)'  ® x/u>  53  a'iai ) ■ 

\ 2=1  Jl  = l 2=1  / 
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As  in  Section  5.2.2,  we  can  estimate  the  expectation  of  T given  the  observed 
data  U by 

m k rii  . ^ m k 


^ / # c.  r\j  i V'l  ^ mi  rv 

^eee^-av^Aee^ 


1=1  j=l  j= 1 


1=1  t=l 


where 


is  a realization  from  an  ergodic  Markov  chain  obtained  by  the  Gibbs  Sampler.  More 
specifically,  to  obtain  the  sequence  we  alternatingly  simulate  a,  and  Rij  in  the 
following  way.  Given  initial  values  for  RtJ , B,  and  a\,  generate  a,  from 


N, 


ni<ra 


2 , i ^ " ( rijuij  B Xij ) , 

Q "l~  J=1 


+ 1 


I . 


Then  generate  RtJ  from 

exp  [~ipd- 1 ((B 'x^  + oti)'  Uy)]  rj"1  exp 


rij  (B 'xij  + oti)'  Uij  - -rfj 


using  the  Accept-Reject  algorithm  with  a candidate  density  the  density  of  the  normal 
distribution 


N 


^ipd—i  ((B  -t-  a,:)  tip)  , lj  • 


Proceeding  as  in  Section  5.2.2,  the  bounding  constant  for  the  Accept-Reject  algorithm 
is  found  to  be 

M = (2t r)3  ({d  - 1)!)_1  (d  - l)d_1 


d—1 


((B 'Xij  + a,)'  u)  - (B'xij  + OLi)'  u) 


x exp 


-tpd-i  ((B 'xij  + OLi)'  u)  + ^ (ipd-i  {{B'xij  + oti)'  ■ 


u) ) — d + 1 


On  the  M-Step  we  find  the  unconditional  expectation  of  T.  Since  ~ we 


have 
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Also, 


/ k m 


k ni 


E E E <r«  - “<)'  ® <>  = E E - B (“■))'  ® <s 

\j=i  j= 1 / i=i  j=i 

k n.i  k rii 

~ y ] y , (B  xij  ~ 0)  <8)  ^ ^ ^ ] [(B  ®ij)  <8>  ®ij] 

i=l  j=l  i= 1 j=l 

= zi  £ tvec  = s ii  tvec  (xvxiJBi<i)]/ 

i=  1 j=l  i=l  j=l 

= J2  [(!d  ® (*«*#))  vec  (B)]/  = vec  (B)'  ^ £ (Id  ® (xoxo))  • 

1=1  j=l  t=l  j=l 

Then  we  set  the  conditional  expectation  of  T given  the  observed  data  U to  be  equal 
to  the  unconditional  expectation  and  find  the  new  estimates  for  B and 

k nj 


vec  ( B ) = 
and 


(xbxQ) 

_i=  1 j=l 


m k rii  \ 

\ 1=1  i= 1 j=l  / 


m k 


-2  1 (O'  (0 

(=1  i=l 

Note  that  Yli=i  (id  <S>  (x^x'j))  is  a dp  x dp  positive  definite  matrix,  and 
thus  its  inverse  does  exist.  To  see  this,  let  V be  a p x d non-zero  matrix,  and 
v — vec  (V).  Then 


v 


SE(id8>  (x«*y) 


i=l  j=l 
k n i 


k rii 

v = J2 v’  (Id  ® (xox'o)) v 

i=  1 j=l 

5]  S [(Id  ® (Xo4))  V}’  V = Y,  J2  tVeC  {XijXijV1d)}' V 

i=l  j= 1 i=l  j=l 

k rii  k rii 

= Y2Y^  tvec  (xijxijy)]'v  = 5Z  5Z  (Vxo  ® v 

i=l  j= 1 i=l  j=l 

= Y,  J2  (xov  ® xo)  v = 51  ii vec  (xp  vVxo) 

2=1  j=l  2=1  j = l 

= E E w'*«  = E E ( v'x«)'  c v'*«)  > °. 

t=l  j=l  i=l  j= 1 
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as  long  as  at  least  one  of  V'a^j  s is  nonzero,  which  is  assured  when  the  X matrix  is 
of  full  column  rank.  In  the  above,  the  third  and  seventh  equalities  follow  from  (3.4), 
the  fifth  follows  from  (3.3),  and  the  sixth  follows  from  (3.2). 

6.2  General  Mixed  Effects  Model 

We  now  consider  the  more  general  mixed  effects  model,  which  treats  the 
directions  U x as  projections  onto  the  unit  sphere  of  unobserved  response  vectors 
arising  from  a multivariate  linear  mixed  effects  model: 

U i- 

1 \\Yi\\ 

Yi  = B'xt  + A'zi  + €i 

A (c^ll  j j OL igi , • • • j OLSl ) • • j &sqs  ) 

oiri,  ■ ■ ■ , arqr  ~ Nd  (0,  a2 I)  (6-2) 

*~Nd(  0,1) 

a.rti  Ci  -independent 

i = r = 1,  — , s;  t = 1,.  ..,qr. 


where  the  p x d matrix  B = (/3j, . . . , (3d ) and  cr2, ...  ,a2  are  unknown  constants,  and 
the  p x 1 vectors  xt  and  (J2l=i  Qr)  x 1 vectors  Zi  are  known  coefficients. 

Treating  Ri  = |j  Y,||  and  A as  missing  data,  the  complete  data  likelihood  is 


/C(Y,A)  = /(y1,...,rn|A)/(A)  = 


s qr 


r=  1 t=  1 


(X 


n/(^iA)  nn/(M 

_i= 1 J |_r= 

f[exp  (Yi  - B'xt  - MZi)’  (Yi  - B'x ( - A' Zi) 

J=  1 ' 

S Qr  / / \ 

nn«p-^ 

r=l  t=l  k r ' 


But 


(Yi  - B'xi  - A'zi)'  (Yi  - B'xi  - A'zi) 

= (Yi  - A’zi)'  (Yi  - A! z^  + 2cc'B  (Yt  - A'Zi)  + x'BB 'Xi, 
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and  thus. 


1A1  ,r 


fc  (Y,  A)  OC  exp  X1B  (Yi  “ A'Zi ) - y Y ~2  E a'rtart 

\ i=  1 Z r=l  °T  t= 1 

As  in  Section  6.1,  we  can  write 

n / n \ 

Y (Yi  - A'z o = E (y<  - A'z‘)'  ® ^ vec  (B)  - 


i=l 


,t=l 


and  therefore 

fc  (Y,  A)  a exp  f [Y(Yi-  A' z '*)'  ® x*' ) vec  (B)  “ i E 4 E artart 


1 A 1 9r 


,»=i 


2 r=l  ^ t=l 


— ^ E (y*  ~ A'Zi)'  (Y  i — A'Zj)  ^ . 

A sufficient  statistic  for  vec  (B)/  and  a\ , . . . , o2s  is 

(n  <71  qs  \ 

Y (Yi  - A'zi )'  ® *i>  E aitai*»  • • • » E a*ia»«  ■ 

i=l  t=l  t=l  / 

Let  r = (ri,...,rn)  be  the  vector  of  unobserved  lengths,  r*  be  the  vector  r with 
the  ith  element  removed,  and  A*t  be  the  matrix  A with  the  qt  + 1 ) th  row 

removed.  To  implement  the  MCMCEM  algorithm  from  the  previous  section,  we  need 
to  alternatingly  simulate  from  the  conditional  densities 


f(art\r,  A*t,U),  l<r<s,l<t<qr  and  / (r<|A,  r* , U) , 1 < i < n. 
Define  zirt  to  be  the  (Yfi=l  Qi  + 1)  th  element  of  zt)  and  let 

Irt  {i  £ { 1, . • • , n}  . Zirt  ^ 0}  . 
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Also  let  z*rt  be  the  vector  Z;  with  zirt  removed.  Then 

f(aTt\r,  A^,U)  = f(<xrt\{Yi}ieIrt,A*rt)  oc  f (art, 

= f A*rt)  f (art\A*rt)  = f ({Yt}ieIrt  | A)  / (art) 

n exp  (Y * ~ B'ab  - A'2*)'  C*" t - B'xi  - A'zi)^  exp  a'rtart^ 

1 1 exP  ^ — 2 ~ B Xj  Artzirt  ocrtZirtJ  - B Zj  Artzirt  — (xrtZiT^j 

J&Irt  X > 


OC 


a 


x exP  ( — 0T2  a' 

r 

1 


a 


l »€/rt 

exp 


n«p  _2z^artart+ (r* 


B Xi  Artzirt  1 Z{TiOlti 


,eXp  ( 


2 y Zirt  T <j2^  ^rt^rt  4*  B Xj  ArtZirt'j  Zjrj^  OC 

Therefore,  we  can  simulate  art  from  the  normal  distribution 


( ( E (y-  - B'X<  - «r) 

\ \ieirt 

We  generate  Ri  from 

f(ri\r*,A,U)  = f(ri\A,ui) 

1 


-1 


%irt 


J2  Zirt  + ^2  ) ’ ( Z*t  + ~2 


Cie/rl 


\ielrt 


exp  [~i>d- 1 ((B'xj  + A 'zi)'  Ui)]  rf  1 exp 


^ (B'xj  + A' Zi)'  Ui  - ^r2 


(d-1)! 

using  the  Accept-Reject  algorithm  with  a candidate  density  the  density  of  the  normal 
distribution 

N (ipd-i  ((B'xj  + A'zi)'  , lj  . 

Analogous  to  the  previous  section,  the  ratio  of  / (r\|r*,  A,  U)  and  this  normal  density 
is  bounded  from  above  by 


M = (27 r)2  ((d  - 1)!)  1 (d  - l)d  1 (ipd- 1 ((B'x,  + A ' z^  u)  - (B'x^  + A 'zi)'uj 
-ipd-i  ((B'xjj  + A'zi)' u)  + ^ (ipd-i  ((B'xy  + A'zi)' u)^j  - d+  1 


x exp 
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On  the  M-Step  the  new  estimates  for  B and  of,  r = 1, . . . , s,  are  found  as 

T — 1 _ _ 


vec 


(b)  = 


HZ)  fa  »(*«*'#)) 

,i=  1 j=l 


X, 


i= 1 i= 1 


and 


m qr 

= s, 

171  1=1  t=  1 


where  is  a realization  from  / (art|r,  A*t,  U), 


A (O'  - a(‘)  a(D  a(0  ^ 


and  r-^  is  a realization  from  / (rj|r*,  A,  U). 


CHAPTER  7 

SUMMARY  AND  FUTURE  WORK 

The  focus  of  this  work  has  been  the  SPML  model  for  directional  data  in  the 
general  d-dimensional  case.  A formula  for  the  mean  resultant  length  of  the  projected 
normal  distribution,  which  is  the  underlying  distribution  of  the  model,  was  derived  for 
an  arbitrary  dimension.  Likelihood  calculations  and  methods  for  computing  MLEs 
for  the  SPML  regression  model  were  presented,  and  the  uniqueness  of  the  estimates 
was  proven.  The  model  was  used  to  develop  multi-sample  likelihood  ratio  tests  of 
hypotheses,  such  as  testing  equal  mean  directions  assuming  a common  concentration, 
equality  of  concentrations,  and  identical  mean  vectors.  It  was  demonstrated  through 
simulations  that  these  tests  perform  reasonably  well  in  terms  of  size  when  the 
data  concentration  is  moderate  to  high.  One  issue  related  to  the  first  two  tests 
is  the  stationary  point  in  the  maximization  of  the  likelihood  under  the  constraint 
of  equal  concentrations.  In  all  the  simulations  the  stationary  point  wras  successfully 
achieved,  however,  a theoretical  proof  of  a global  and  unique  maximum  still  has  to 
be  established.  The  methods  of  the  nonlinear  programming  might  be  needed  for  this 
purpose. 

The  large  concentration  asymptotic  distribution  of  the  transformed  likelihood 
ratio  test  statistic  for  testing  equality  of  mean  directions  with  same  concentration 
assuming  an  underlying  von  Mises-Fisher  distribution  was  derived.  It  was  shown 
through  simulations  that  it  approximates  very  well  the  distribution  of  the  test 
statistics  under  the  null  hypothesis.  It  was  also  demonstrated,  again  by  simulations, 
that  this  distribution  approximates  well  the  null  distribution  of  the  likelihood  ratio 
test  statistic  under  the  SPML  model  too,  however,  a rigorous  theoretical  justification 
for  this  is  yet  to  be  found.  Asymptotic  expansions  of  the  likelihood  function  might 
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be  useful  in  achieving  this.  The  two  tests  were  compared  in  terms  of  both  size  and 
power,  and  were  found  to  perform  similarly  well. 

The  SPML  model  was  extended  to  include  random  effects  as  well.  Four  different 
methods  for  estimating  the  parameters  in  the  special  case  of  a one-way  random  effects 
model  were  considered:  direct  maximization  of  the  likelihood  using  Gauss-Hermite 
quadrature,  Markov  chain  Monte  Carlo  EM  algorithm  (MCMCEM),  Monte  Carlo 
algorithm,  and  a method  based  on  the  results  from  maximizing  the  fixed  effects 
regression  setup.  For  the  more  general  cases  of  a random  intercept  and  mixed  effects 
models,  an  MCMCEM  algorithm  was  developed  to  estimate  the  parameters.  A 
thorough  investigation  should  be  done  to  understand  the  implementation  of  these 
methods  in  detail. 


APPENDIX  A 

REPEATED  INTEGRALS  OF  3> 

Let  (f)( •)  and  <&(•)  represent  the  standard  normal  density  and  distribution 
functions,  respectively.  Define  $_i(x)  = 4>(x)  and  for  k = 0, 1, , 


$*0*0  = f <Et-i  (z)dz. 

J — oo 


(A.l) 


Then 


$o(x)  = $(x)  = f (p(z)  dz  = f (x  - zf(j>(z)  dz 

J —oo  J — oo 

and  assuming  that 

1 f x 

$ k (x)  = J (x  - z)k<f>(z ) dz 

for  k = 1, . . . , K,  we  have 

$k+i(x)  = f $k :{z)dz, 

J — OO 


(A.2) 


-OO 

fX 


= F h f {z  ~ t)K<p{t) 

J —oo  * J — oo 

hyJjx~tn,) 


dtdz 

dt 


(K  + 


dt. 


Thus  it  follows  by  induction  that  (A.2)  holds  for  all  k = 0, 1, 2, 

Remark  A.l.  Note  that  the  analogue  of  (A.2)  actually  holds  for  repeated 
integrals  of  any  density.  However,  the  remainder  of  this  section  is  more  specific 
to  the  normal  case. 

Using  the  fact  that  </>(■)  is  an  even  function  and  the  simple  change  of  variables 
y = x — z in  (A.2),  yields 


1 f00 

$fc(x)  = — J zk<p(z  - x)  dz 


(A.3) 
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as  an  alternative  expression  for  the  Ath  repeated  integral  of  4>. 

Finally,  note  that  can  be  computed  recursively  using  the  identity 

k$k(x)  = $fc_2(x)  + (A.4) 


for  k = 1,2, This  can  be  shown  in  a number  of  equivalent  ways.  For  example, 

from  (A. 2)  and  the  fact  that  d(j){z)/dz  = —zcf)(z),  the  case  k = 1 follows  immediately: 


$i(x)  = f (x  — z)(f>(z)  dz 

J — OO 

/X  rx 

—zcj){z)  dz  + x (j)(z) 

•oo  J — OO 


dz 


= (j){x)  + x<f>(x)  = 4>_i(x)  + x4>0(x). 


Now  assuming  that  (A.4)  holds  for  A;  = 1 we  have  by  substitution  and 

integration  by  parts, 


K$k+1(x)  = j K$k(z)  dz 

J — OO 

= [ [§K-2(z)  + Z$K-i{z)\dz 

= 4>/c_i(x)+  f z$K_i(z)dz 

J —OO 

= $AT-l(x)  +x$K{x)  - f §K(z)dz 

J — oo 

= $K-i(:r)  + x$K(x)  - 4>/c+i(x) 


from  which  (A.4)  follows  for  k = K + 1. 

We  next  define  a multivariate  generalization  of  the  repeated  integral  of  the 
normal  distribution.  Let 

= 0n(®,  E), 

where  </>„(■,  S)  is  the  density  of  Nn( 0,  £),  the  n-dimensional  normal  distribution  with 
mean  0 and  covariance  matrix  £.  Then,  for  k > 0,  define 


4>tf](x,E)=  f 

J — OO 
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Of  course  <f>|?(£C,  E)  = $„(*,  E),  the  cumulative  distribution  function  of  Nn(0,  E). 
It  is  easily  shown  by  induction  that  for  k > 0, 


s)  = J {f[fr  ~ *i)  j K{t,  S)  dt , 

and  by  a change  of  variable,  that 

$!?(*.  E)  = <t>n(z-x,  E)dz. 


Unfortunately  there  does  not  seem  to  be  any  simple  multivariate  analogue  of  the 
iterative  formula  for  computing  $^\x,  1)  from  1)  and  <J>f-2^(a:,  1)  (that  is 

formula  A. 4).  However,  we  suspect  that  the  techniques  of  Genz  (1992)  could  be  used 
for  computing  <E>^(x,  E). 


APPENDIX  B 

POSITIVE  DEFINITENESS  OF  THE  NEGATIVE  HESSIAN 
We  will  show  that  in  order  for  the  negative  Hessian  of  the  log-likelihood  to  be 
(strictly)  positive  definite,  it  is  sufficient  that  the  matrix  X = (xi, . . . , xn)'  have  full 
rank  p (we  assume  p < n). 

To  do  so,  it  is  sufficient  to  show  that  — Aj  = — Aj(B)  is  (strictly)  positive  definite 
for  all  i and  for  all  B.  To  see  this,  note  that  if  X has  full  column  rank,  then  for  any 
dB  = (d/3j, . . . , d/3d)  / 0,  there  exists  a j,  1 < j < n,  such  that,  e.g.,  x'd /31  / 0 (we 
assume  without  loss  of  generality  that  d (31  0).  From  this  it  follows  immediately 

that  (Id  <g>  x'j )vecdB  = (x'-d/3x, . . . , x'jd/3d)'  ^ 0,  and  if  —A j is  positive  definite,  then 

— (vecdB)'/j(B)(vecdB)  = — [(Id  <8>  x')vecdB]'Aj[(Id  (8)  x')vecdB]  > 0.  However,  if 
—A i is  positive  definite  for  all  i = 1, . . . , n,  we  also  have  — (vecdB)'Z(B)vecdB  > 

— (vecdB)'Z_,(B)vecdB  > 0,  as  desired. 

We  thus  consider  the  matrix  — Aj  = Id  — ■0d_i(/x'uj)wju'.  If  v is  an  arbitrary 
cZ-dimensional  unit  vector,  then 

-v'A iV  = v'v  - = 1 - {v'u)2'ipd_i{p!lul). 

Since  0 < ( v’u )2  < 1,  it  suffices  to  show  that  ^^(Z)  < 1 for  all  t,  — oo  < t < oo. 

Let  k > 1 be  an  integer  (k  = d — 1 in  the  above  notation),  and  recall  that 

ipk(t)  = log {<hfc(z)/0(z)}  = log {<Lfc(f)}  - log{0(z)}. 

Now, 

= iw = -*• 
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implies  that 

^log{0(t)}  = -1, 

and  hence 

i>(t)  = ^log  {$fc(t)}  + 1, 

or  equivalently, 

1 - 4>k(t)  = -^2  log{$fcW}- 

Thus  showing  that  1 — ipk(t)  > 0 for  all  t is  equivalent  to  showing  that  <f>fc(t)  is  strictly 
log-concave.  But  this  follows  by  induction  from  the  following  Lemma  in  conjunction 
with  the  fact  that  0(f)  is  strictly  log-concave. 

Lemma  B.l.  If  f is  a ( strictly ) log-concave  function , then  it’s  indefinite  integral 
9(t ) = fa  /(s)  ds  (a  is  arbitrary)  is  also  (strictly)  log-concave. 


Proof  Let  h(t)  = log{/(t)}.  Then  (d/df)log{5(f)}  = f(t)/g(t),  and 


^iog  {,(*» 


92(t) 

h(t)ehdl  f*  eh W ds  — e2hdl 


92(t ) 


W) 


h(t)  f i 

J a 


Ms)  ds  - eft(t) 


Since  h is  concave,  h is  decreasing,  and  thus 


h(t)  f 

J a 


eh^  ds  < f h(s)ehl's ^ ds 
J a 

— ehh)  _ g/*(“)  < g h(t) 


Thus  (d[2]/d£2)  log{g(f)}  < 0 and  g is  log-concave. 

If  / is  strictly  log-concave,  then  h is  strictly  decreasing,  and  the  same  argument 
shows  that  (d[2]/df2)  log{g(t)}  <0.  □ 


APPENDIX  C 

MAXIMUM  LIKELIHOOD  CALCULATIONS  FOR  THE  MULTI-SAMPLE 
PROBLEM  WITH  EQUAL  CONCENTRATIONS 

In  Chapter  4 we  defined  LC(B;  6)  as 

fc-i 

LC(B;  6)  = l (B)  + {Vi+iVi+i  ~ R>i)  • 

i=  1 

Define 

Ci  (B)  = — //j/ij,  i = 1, . . . , k — 1, 

k- 1 

C(  B;0)  = ^0iCi(B). 

j=i 

In  this  notation,  LC(B;  6)  can  be  rewritten  as 

Lc(B;0)  = /(B)  + C(B;0).  (C.l) 

Notice  that 

Ci-l  (B)  - /x'/x2  - /x>! 

= x'BB'x*  — aiiBBT] 

= vec  (x'BB'x*)  — vec  (a^BB'cci) 

= (x'B  (8>  a:')  vec  (B)  — (a^B  g xQ  vec  (B) 

= (B'xi  g Xi)'  vec  (B)  — (B'xi  g Xi/vec  (B) 

= (vec  ( Xi  x[B))'  vec  (B)  — (vec  (XiXjB)/  vec  (B) 

= ((Idgxtx')vec(B))'vec(B)  - ((Id  g xix\)  vec  (B))'  vec  (B) 

= vec  (B)'  (Id  g x*x')  vec  (B)  — vec  (B)'  (Id  g Xix'Q  vec  (B) 

= vec  (B)'  (Id  g (xix'i  - XiXj))  vec  (B) , 
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where  the  third  equation  follows  from  the  fact  that  I'BB'xj  is  a scalar,  the  fifth  and 
eighth  equation  follow  from  (3.2),  the  sixth  follows  from  (3.3),  the  forth  and  seventh 
from  (3.4).  Denote 

Ij  — X^X^  X\X j. 

Note  that  I*  is  a k x k matrix  with  zth  elements  in  the  first  row,  first  column,  and 
main  diagonal  equal  to  1,  and  the  rest  of  the  elements  equal  to  0.  In  this  notation, 

Ci _!  (B)  = vec  (B)'  (Id  0 I*)  vec  (B) . 

This  expression  is  a quadratic  form  in  vec  (B).  The  differential  of  a quadratic  form 
x'Ax  is  given  by  d (s' Ax)  = x'  (A  + A')  dx  (see  Magnus  and  Neudecker,  1988, 
p.  177).  Therefore,  the  differential  of  C,-_ i is 

d Ci-i  = vec  (B)'  (ld  0 I*  + (Id  0 1*)')  dvec  (B) . 

Since  Id  0 I*  is  symmetric, 

d Ci_!  = 2vec  (B)'  (Id  0 I*)  dvec  (B) . (C.2) 

From  the  First  Identification  Theorem  (see  Magnus  and  Neudecker,  1988,  Theorem  5.11, 
p.  96),  the  row  vector  of  partial  derivatives  of  Cj_i  with  respect  to  vec  (B)  is 

DC<_1(B)  = 2vec(B)'(Id0l*). 

In  our  notation,  the  vector  of  partial  derivatives  of  C,_i  is 


Ci- 1 (B)  = (D Ci-.x  (B))'  = 2 (^  0 I*)  vec  (B)  = 2vec  (I*B) . 
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Then,  for  the  derivative  of  C (B;  9)  with  respect  to  vec(B)  we  obtain 

fc-l  k- 1 

C (B;  0)  = £ 9iCi  (B)  = Y,  29'vec  &+iB) 

*= l 

‘fc-i 

J>Shb 


i= 1 


= 2vec 


= 2vec 


2=1 
/k- 1 


V 2=1 


It  is  easy  to  see  that 


fc-i 


EWhi'C, 


2=1 


where  C is  a k x k matrix  with  first  row,  first  column  and  main  diagonal  the  vector 
(0, 0i, ... , 6k- 1),  and  Os  elsewhere,  i.e. 


C = 


( 0 0j 

0i  0i 
6k- 2 0 

V 4-i  o 


6k- 2 6k— i 

0 0 


\ 


4-2  0 

o 4_i 


Therefore, 


C (B;  9)  = 2vec  (CB) . 

From  Section  3.2  we  have  that  the  derivative  of  the  log- likelihood  is 


/(B)  - vec  (X'MU  - X'XB) . (C.3) 

Note  that  from  (C.l)  for  the  derivative  of  Lc(B;0)  with  respect  to  vec(B)  we  have 

LC(B;  9)  = / (B)  + C (B;  9) 

= vec  (X'MU  - X'XB)  + 2vec  (CB) 

= vec  (X'MU  - X'XB  + 2CB) . 
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Also,  notice  that  the  partial  derivative  of  Lc( B;  6)  with  respect  to  6 is 


Therefore,  the  score  function  of  Lc  (B;  0)  is 


S(B;0)  = 


( Lc( B-e) N 

C'i(B) 


\ Ck- 1 (B) 


To  obtain  the  Hessian  of  Lc  (B,  6)  first  we  find  the  second  differential  of  Ct . Note 
that  the  differential  of  a linear  form  x'a  is  given  by  d (x'a)  = d (x')a  (see  Magnus  and 


Neudecker,  1988,  p.  177).  Therefore,  from  (C.2)  it  follows  that  the  second  differential 
of  C*_i  is 


From  the  second  Identification  Theorem  (see  Magnus  and  Neudecker,  1988,  Theorem 
6.1.3)  and  the  symmetry  of  Id  ® I*,  it  follows  that  the  second  derivative  of  Q_i  with 
respect  to  vec  (B)  is 

Ci-i(  B)  = 2Id®I?. 

Therefore,  the  second  derivative  of  C (B;  6)  with  respect  to  vec  (B)  is 

fc-i 


Note  that  from  (C.l)  we  have  that  the  second  derivative  of  Lc  (B;  G)  with  respect  to 
vec  (B)  is 


d2Ci_i  = 2dvec  (B)'  (Id  ® I*)  dvec  (B) . 


C( B;  9)  = ]T  W d <g>  I; 


•j+i 


= 2Irf  ® C. 


Lc  (B;  0)  = l (B)  + C (B;  0) . 
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Also,  note  that  the  partial  derivative  of  Lc  (B;  9)  with  respect  to  9 is  the  (k  — 1)  x dk 


matrix 


dLc{  B;fl) 

de 


(p\  (B) , . . . , Ck~i  (B))/ . 


Therefore,  the  Hessian  of  Lc  (B;  9)  is 


H (B,  9) 


Lc  (B;  9)  W'  (B) 
v W(B)  0 J 


where  0 is  a (k  — 1)  x (k  — 1)  matrix  of  zeros. 


APPENDIX  D 

HUMAN,  GORILLA,  AND  CHIMPANZEE  SUPERIOR  FACET  DATA 


Table  D-l:  Directions  of  Superior  Facet  Major  Axis  of  19  Humans. 


Human 

Left  Superior  Facet 

X 

y 

z 

1 

-0.232626002 

0.789040592 

0.568594837 

2 

0.705640568 

0.556542936 

0.438555983 

3 

-0.216690858 

0.687352987 

0.693246669 

4 

-0.027084909 

0.725643015 

0.687538088 

5 

0.365630591 

0.51236766 

0.777041603 

6 

-0.356423178 

0.759490976 

0.54418377 

7 

-0.036137881 

0.790870284 

0.610915908 

8 

0.928698195 

0.010230272 

0.370695299 

9 

-0.600703353 

0.743725862 

0.293304149 

10 

-0.232884076 

0.691028544 

0.684283975 

11 

-0.1570887 

0.924260118 

0.347945937 

12 

-0.936953711 

0.300668534 

0.17809036 

13 

-0.846252014 

0.512442928 

0.145807316 

14 

0.22968119 

0.828180527 

0.51123729 

15 

0.007090076 

0.679000772 

0.734103319 

16 

0.608351207 

0.478174321 

0.633449389 

17 

-0.237144634 

0.820309932 

0.520436392 

18 

0.246783318 

0.761121436 

0.59982677 

19 

-0.07770557 

0.738405188 

0.669865377 
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Table  D-2:  Directions  of  Superior  Facet  Major  Axis  of  16  Gorillas. 


Left  Superior  Facet 

Gorilla 

X 

y 

z 

1 

-0.646692362 

0.551726778 

0.526674996 

2 

-0.338189756 

0.794378507 

0.504569595 

3 

-0.135651164 

0.418186882 

0.898175091 

4 

-0.563010619 

0.574591978 

0.594022812 

5 

-0.513659117 

0.740885012 

0.432716662 

6 

0.593245907 

0.678817306 

0.432742832 

7 

0.018458602 

0.726504759 

0.68691347 

8 

0.814704442 

-0.347134311 

0.464493749 

9 

-0.307661547 

0.646248905 

0.698360026 

10 

-0.228339145 

0.703064721 

0.673469548 

11 

-0.632247418 

0.536894288 

0.558576518 

12 

-0.008222793 

0.27295219 

0.961992457 

13 

0.405375303 

0.199199156 

0.89218303 

14 

-0.469248487 

0.638757471 

0.609749745 

15 

0.338822633 

0.112582819 

0.934090109 

16 

-0.747115426 

0.505394562 

0.431734729 
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Table  D -3:  Directions  of  Superior  Facet  Major  Axis  of  18  Chimpanzees. 


Chimpanzee 

Left  Superior  Facet 

X 

y 

z 

1 

0.183441427 

0.22549809 

0.956817566 

2 

0.023483611 

0.279006583 

0.960002003 

3 

-0.161715825 

0.395314006 

0.904198445 

4 

0.078223644 

0.305897152 

0.948845611 

5 

0.466868529 

0.192864426 

0.863039448 

6 

-0.885534281 

0.313710362 

-0.342658497 

7 

0.416968563 

0.27037124 

0.867776821 

8 

-0.555172151 

0.712299339 

0.42943397 

9 

-0.565670044 

0.638866781 

0.521408321 

10 

0.374428983 

0.033188592 

0.926661456 

11 

-0.029586304 

0.331367789 

0.943037666 

12 

-0.453476564 

0.513834875 

0.728239471 

13 

0.511311739 

0.012676344 

0.859301819 

14 

0.141493493 

0.392406881 

0.908843458 

15 

-0.542149037 

0.42327311 

0.725888625 

16 

-0.121767033 

0.442879973 

0.888273674 

17 

-0.394221883 

0.315210721 

0863267808 

18 

-0.52839898 

0.660934692 

0.532878832 

APPENDIX  E 
SIMULATION  RESULTS 


Testl:  10-10,  Rho=0.1 


Testl:  10-10,  Rho=0.25 


Test  1:  10-10,  Rho=0.5 


Testl:  10-10,  Rho=0.75 


Test  1:  10-10,  Rho=0.95 


Test  1:10-10,  Rho=0.99 


Figure  E-l:  P-values  from  Test  1 (y-axis)  versus  uniform  quantiles  (x-axis);  sample 
sizes  (10,10). 


Testl:  10-10,  Rho=0.1 


Test  1:10-10,  Rho=0.25 


Testl  - 10-10,  Rho=0.5 


Testl:  10-10,  Rho=0.75 


Testl:  10-10,  Rho=0.95 


Testl:  10-10,  Rho=0.99 


Figure  E-2:  P-values  from  the  corrected  Test  1 (y-axis)  versus  uniform  quantiles 
(x-axis);  sample  sizes  (10,10). 
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Test  1: 10-20,  Rho=0.1  Test  1: 10-20,  Rho=0.25 


Test  1: 10-20,  Rho=0.75 


Test  1: 10-20,  Rho=0.95 


Figure  E-3:  P- values  from  Test  1 (y-axis)  versus  uniform 
sizes  (10,20). 


Test  1: 10-20,  Rho=0.1  Test  1: 10-20,  Rho=0.25 


Test  1:  10-20,  Rho=0.75 


Test  1: 10-20,  Rho=0.95 


Figure  E-4:  P-values  from  the  corrected  Test  1 (y-axis) 
(x-axis);  sample  sizes  (10,20). 


Test  1: 10-20,  Rho=0.5 


0.00  0.02  0.04  0.06  0.08  0.10 


Test  1:  10-20,  Rho=0.99 


n 1 1 1 1 r 

0.00  0.02  0.04  0.06  0.08  0.10 


quantiles  (x-axis);  sample 


Test  1: 10-20,  Rho=0.5 


t i i i i r 


0.00  0.02  0.04  0.06  0.08  0.10 


Test  1:  10-20,  Rho=0.99 


n 1 1 1 1 r 

0.00  0.02  0.04  0.06  0.08  0.10 


versus  uniform  quantiles 
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Test  1:  20-20,  Rho=0.75 


Test  1 : 20-20,  Rho=0.95 


Test  1:  20-20,  Rho=0.99 


Figure  E-  5:  P- values  from  Test  1 (y-axis)  versus  uniform  quantiles  (x-axis);  sample 
sizes  (20,20). 


Test  1:  20-20,  Rho=0.1 


Test  1:  20-20,  Rho=0.25 


Test  1:  20-20,  Rho=0.5 


Test  1:  20-20,  Rho=0.75  Test  1:  20-20,  Rho=0.95  Test  1:  20-20,  Rho=0.99 


Figure  E 6:  P-values  from  the  corrected  Test  1 (y-axis)  versus  uniform  quantiles 
(x-axis);  sample  sizes  (20,20). 


92 


Test  1:  20-40,  Rho=0.1 


Test  1:  20-40,  Rho=0.25 


Test  1:  20-40,  Rho=0.5 


Test  1:  20-40,  Rho=0.75 


Test  1:  20-40,  Rho=0.95 


Test  1:  20-40,  Rho=0.99 


Figure  E-7:  P-values  from  Test  1 (y-axis)  versus  uniform  quantiles  (x-axis);  sample 
sizes  (20,40). 


Test  1 : 20-40,  Rho=0.1  Test  1 : 20-40,  Rho=0.25  Test  1 : 20-40,  Rho=0.5 


Test  1:  20-40.  Rho=0.75 


Test  1:  20-40,  Rho=0.95 


Test  1:  20-40,  Rho=0.99 


Figure  E - 8:  P-values  from  the 
(x-axis);  sample  sizes  (20,40). 


corrected  Test  1 (y-axis)  versus  uniform  quantiles 
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Test  1:  40-40,  Rho=0.1  Test  1:  40-40,  Rho=0.25  Test  1 : 40-40,  Rho=0.5 


Test  1:  40-40,  Rho=0.75 


Test  1:  40-40,  Rho=0.95 


Test  1:  40-40,  Rho=0.99 


Figure  E -9:  P-values  from  Test  1 (y-axis)  versus  uniform  quantiles  (x-axis);  sample 
sizes  (40,40). 


Test  1:40-40,  Rho=0.1 


Test  1:  40-40,  Rho=0.25 


Test  1:  40-40,  Rho=0.5 


Test  1 : 40-40,  Rho=0.75 


Test  1:  40-40,  Rho=0.95 


Test  1:  40-40,  Rho=0.99 


Figure  E-10:  P-values  from  the  corrected  Test  1 (y-axis)  versus  uniform  quantiles 
(x-axis);  sample  sizes  (40,40). 
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Test  2: 10-10,  Rho=0.1  Test  2: 10-10,  Rho=0.25  Test  2: 10-10,  Rho=0.5 


Test  2: 10-10,  Rho=0.75  Test  2: 10-10,  Rho=0.95  Test  2: 10-10,  Rho=0.99 


Figure  E— 11:  P- values  from  Test  2 (y-axis)  versus  uniform  quantiles  (x-axis);  sample 
sizes  (10,10). 


Test  2: 10-10,  Rho=0!  Test  2: 10-10,  Rho=0.25  Test  2: 10-10,  Rho=0.5 


Test  2: 10-10,  Rho=0.75 


Test  2:  10-10,  Rho=0.95 


Test  2:  10-10,  Rho=0.99 


Figure  E-12:  P-values  from  the  corrected  Test  2 (y-axis)  versus  uniform  quantiles 
(x-axis);  sample  sizes  (10,10). 
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Test  2:  10-20,  Rho=0.1 


Test  2:  10-20,  Rho=0.25 


Test  2:  10-20,  Rho=0.5 


Test  2: 10-20,  Rho=0.75 


Test  2:  10-20,  Rho=0.95 


Test  2:  10-20,  Rho=0.99 


Figure  E- 13:  P- values  from  Test  2 (y-axis)  versus  uniform  quantiles  (x-axis);  sample 
sizes  (10,20). 


Test  2: 10-20,  Rho=0.1 


Test  2:  10-20,  Rho=0.25 


Test  2: 10-20,  Rho=0.5 


Figure  E 14:  P-values  from  the  corrected  Test  2 (y-axis)  versus  uniform  quantiles 
(x-axis);  sample  sizes  (10,20). 
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Test  2:  20-20,  Rho=0.75 


Test  2:  20-20,  Rho=0.95 


Test  2:  20-20,  Rho=0.99 


Figure  E— 15:  P- values  from  Test  2 (y-axis)  versus  uniform  quantiles  (x-axis);  sample 
sizes  (20,20). 


Test  2:  20-20,  Rho=0.75 


Test  2:  20-20,  Rho=0.95 


Test  2:  20-20,  Rho=0.99 


Figure  E-16:  P-values  from  the  corrected  Test  2 (y-axis)  versus  uniform  quantiles 
(x-axis);  sample  sizes  (20,20). 
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Test  2:  20-40,  Rho=0.1  Test  2:  20-40,  Rho=0.25  Test  2:  20-40,  Rho=0.5 


Figure  E-17:  P-values  from  Test  2 (y-axis)  versus  uniform  quantiles  (x-axis);  sample 
sizes  (20,40). 


Test  2:  20-40,  Rho=0.1 


Test  2:  20-40,  Rho=0.25 


Test  2:  20-40,  Rho=0.5 


Test  2:  20-40,  Rho=0.75 


0.00  0.02  0.04  0.06  0.08  0.10 


Test  2:  20-40,  Rho=0.95 


0.00  0.02  0.04  0.06  0.08  0.10 


Test  2:  20-40,  Rho=0.99 


0.00  0.02  0.04  0.06  0.08  0.10 


Figure  E— 18:  P-values  from  the  corrected  Test  2 (y-axis)  versus  uniform  quantiles 
(x-axis);  sample  sizes  (20,40). 
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Test  2:  40-40,  Rho=0.75 


Test  2:  40-40,  Rho=0.95 


Test  2:  40-40,  Rho=0.99 


Figure  E-19:  P-values  from  Test  2 (y-axis)  versus  uniform  quantiles  (x-axis);  sample 
sizes  (40,40). 


Test  2:  40-40,  Rho=0.75 


Test  2:  40-40,  Rho=0.95 


Test  2:  40-40,  Rho=0.99 


Figure  E-20:  P-values  from  the  corrected  Test  2 (y-axis)  versus  uniform  quantiles 
(x-axis);  sample  sizes  (40,40). 
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Test  3:  10-10,  Rho=0.1 


Test  3:  10-10,  Rho=0.25 


Test  3: 10-10,  Rho=0.5 


Test  3: 10-10,  Rho=0.75  Test  3:  10-10,  Rho=0.95  Test  3:  10-10,  Rho=0.99 


Figure  E 21:  P- values  from  Test  3 (y-axis)  versus  uniform  quantiles  (x-axis);  sample 
sizes  (10,10). 


Test  3:  10-10,  Rho=0.75 


Test  3:  10-10,  Rho=0.95 


Test  3:  10-10,  Rho=0.99 


Figure  E-22:  P-values  from  the  corrected  Test  3 (y-axis)  versus  uniform  quantiles 
(x-axis);  sample  sizes  (10,10). 
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Test  3:  10-20,  Rho=0.1 


Test  3:  10-20,  Rho=0.25 


Test  3: 10-20,  Rho=0.5 


Test  3:  10-20,  Rho=0.75 


Test  3:  10-20,  Rho=0.95 


Test  3:  10-20,  Rho=0.99 


Figure  E-23:  P- values  from  Test  3 (y-axis)  versus  uniform  quantiles  (x-axis);  sample 
sizes  (10,20). 


Test  3: 10-20,  Rho=0.1 


Test  3:  10-20,  Rho=0.25 


Test  3: 10-20,  Rho=0.5 


Test  3:  10-20,  Rho=0.75 


Test  3:  10-20,  Rho=0.95 


Test  3:  10-20,  Rho=0.99 


Figure  E-24:  P-values  from  the  corrected  Test  3 (y-axis)  versus  uniform  quantiles 
(x-axis);  sample  sizes  (10,20). 
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Test  3:  20-20,  Rho=0.1 


Test  3:  20-20,  Rho=0.25 


Test  3:  20-20,  Rho=0.5 


Test  3:  20-20,  Rho=0.75  Test  3:  20-20,  Rho=0.95  Test  3:  20-20,  Rho=0.99 


Figure  E-25:  P- values  from  Test  3 (y-axis)  versus  uniform  quantiles  (x-axis);  sample 
sizes  (20,20). 


Test  3:  20-20,  Rho=0.1 


Test  3:  20-20,  Rho=0.25 


Test  3:  20-20,  Rho=0.5 


0.00  0.02  0.04  0.06  0.08  0.10 


0.00  0.02  0.04  0.06  0.08  0.10 


0.00  0.02  0.04  0.06  0.08  0.10 


Figure  E-26:  P-values  from  the  corrected  Test  3 (y-axis)  versus  uniform  quantiles 
(x-axis);  sample  sizes  (20,20). 
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Test  3:  20-40,  Rho=0.1 


Test  3:  20-40,  Rho=0.25 


Test  3:  20-40,  Rho=0.5 


Test  3:  20-40,  Rho=0.75 


Test  3:  20-40,  Rho=0.95 


Test  3:  20-40,  Rho=0.99 


Figure  E-27:  P- values  from  Test  3 (y-axis)  versus  uniform  quantiles  (x-axis);  sample 
sizes  (20,40). 


Test  3:  20-40,  Rho=0.1 


Test  3:  20-40,  Rho=0.25 


Test  3:  20-40,  Rho=0.5 


Test  3:  20-40,  Rho=0.75 


Test  3:  20-40,  Rho=0.95 


Test  3:  20-40,  Rho=0.99 


Figure  E -28:  P-values  from  the  corrected  Test  3 (y-axis)  versus  uniform  quantiles 
(x-axis);  sample  sizes  (20,40). 
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Test  3:  40-40,  Rho=0.1 


Test  3:  40-40,  Rho=0.25 


Test  3:  40-40,  Rho=0.5 


Test  3:  40-40,  Rho=0.75 


Test  3:  40-40,  Rho=0.95 


Test  3:  40-40,  Rho=0.99 


Figure  E-29:  P- values  from  Test  3 (y-axis)  versus  uniform  quantiles  (x-axis);  sample 
sizes  (40,40). 


Test  3:  40-40,  Rho=0.1 


Test  3:  40-40,  Rho=0.25 


Test  3:  40-40,  Rho=0.5 


Figure  E-30:  P-values  from  the  corrected  Test  3 (y-axis)  versus  uniform  quantiles 
(x-axis);  sample  sizes  (40,40). 
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LRMF:  10-10,  Rho=0.1 


LRMF:  10-10,  Rho=0.25 


LRMF:  10-10.  Rho=0.5 


LRMF:  10-10,  Rho=0.75 


LRMF:  10-10,  Rho=0.95 


LRMF:  10-10,  Rho=0.99 


Figure  E 31:  P- values  from  LRMF  (y-axis)  versus  uniform  quantiles  (x-axis);  sample 
sizes  (10,10). 


LRMF:  10-10,  Rho=0.1 


LRMF:  10-10,  Rho=0.25 


LRMF:  10-10,  Rho=0.5 


LRMF:  10-10,  Rho=0.75 


LRMF:  10-10,  Rho=0.95 


LRMF:  10-10,  Rho=0.99 


Figure  E-32:  P-values  from  the  corrected  LRMF  (y-axis)  versus  uniform  quantiles 
(x-axis);  sample  sizes  (10,10). 
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LRMF:  10-20,  Rho=0.1 


LRMF:  10-20,  Rho=0.25 


LRMF:  10-20,  Rho=0.5 


LRMF:  10-20,  Rho=0.75 


LRMF:  10-20,  Rho=0.95 


LRMF:  10-20,  Rho=0.99 


Figure  E-33:  P- values  from  LRMF  (y-axis)  versus  uniform  quantiles  (x-axis);  sample 
sizes  (10,20). 


LRMF:  10-20,  Rho=0.1 


LRMF:  10-20,  Rho=0.25 


LRMF:  10-20,  Rho=0.5 


LRMF:  10-20,  Rho=0.75  LRMF:  10-20,  Rho=0.95  LRMF:  10-20,  Rho=0.99 


Figure  E-34:  P-values  from  the  corrected  LRMF  (y-axis)  versus  uniform  quantiles 
(x-axis);  sample  sizes  (10,20). 
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LRMF:  20-20,  Rho=0.1 


LRMF:  20-20,  Rho=0.25 


LRMF:  20-20,  Rho=0.5 


LRMF:  20-20,  Rho=0.75 


LRMF:  20-20,  Rho=0.95 


LRMF:  20-20,  Rho=0.99 


Figure  E-35:  P- values  from  LRMF  (y-axis)  versus  uniform  quantiles  (x-axis);  sample 
sizes  (20,20). 


LRMF:  20-20,  Rho=0.1 


LRMF:  20-20,  Rho=0.25 


LRMF:  20-20,  Rho=0.5 


LRMF:  20-20,  Rho=0.75 


LRMF:  20-20,  Rho=0.95 


LRMF:  20-20,  Rho=0.99 


Figure  E-36:  P- values  from  the  corrected  LRMF  (y-axis)  versus  uniform  quantiles 
(x-axis);  sample  sizes  (20,20). 


107 


LRMF:  20-40,  Rho=0.1 


LRMF:  20-40,  Rho=0.25 


LRMF:  20-40,  Rho=0.5 


LRMF:  20-40,  Rho=0.75 


LRMF:  20-40,  Rho=0.95 


LRMF:  20-40,  Rho=0.99 


Figure  E-37:  P-values  from  LRMF  (y-axis)  versus  uniform  quantiles  (x-axis);  sample 
sizes  (20,40). 


LRMF:  20-40,  Rho=0.1 


LRMF:  20-40,  Rho=0.25 


LRMF:  20-40,  Rho=0.5 


LRMF:  20-40,  Rho=0.75 


LRMF:  20-40,  Rho=0.95 


LR-F:  20-40,  Rho=0.99 


Figure  E-38:  P-values  from  the  corrected  LRMF  (y-axis)  versus  uniform  quantiles 
(x-axis);  sample  sizes  (20,40). 
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LRMF:  40-40,  Rho=0.1 


LRMF:  40-40,  Rho=0.25 


LRMF:  40-40,  Rho=0.5 


LRMF:  40-40,  Rho=0.75 


LRMF:  40-40,  Rho=0.95 


LRMF:  40-40,  Rho=0.99 


Figure  E-39:  P-values  from  LRMF  (y-axis)  versus  uniform  quantiles  (x-axis);  sample 
sizes  (40,40). 


LRMF:  40-40,  Rho=0.1 


LRMF:  40-40,  Rho=0.25 


LRMF:  40-40,  Rho=0.5 


LRMF:  40-40,  Rho=0.75 


LRMF:  40-40,  Rho=0.95 


LRMF:  40-40,  Rho=0.99 


Figure  E -40:  P-values  from  the  corrected  LRMF  (y-axis)  versus  uniform  quantiles 
(x-axis);  sample  sizes  (40,40). 


Table  E-l:  Proportion  of  times  the  Test  1 and  LRMF  test  statistics  exceeded  the  0.05  upper  quantile  of  the  F2-2(n-2)  distribution 
when  testing  two  groups  of  sample  sizes  10  and  10,  in  three  dimensions;  5000  simulated  values  of  the  test  statistic  were  used. 
Separ  = angular  separation  in  degrees  of  the  true  mean  vectors. 
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p = 0.99 

LRMF 

.0506 

.3604 

.9268 

.9994 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

Test  1 

.0474 

.3612 

.9280 

.9996 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

*0 

OS 

o 

II 

a 

LRMF 

.0486 

.1010 

.3074 

.5906 

.8418 
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1.0000 

Test  1 

.0506 
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.2938 

.5918 

.8476 

.9592 

.9948 

.9990 

1.0000 

p = 0.85 

LRMF 

.0530 

.0670 
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.2048 

.3394 

.5140 
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.7978 

.8894 

Test  1 

.0552 

.0676 
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.3588 

.5166 
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p = 0.75 

LRMF 

.0506 

.0604 
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.2960 

.3888 
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Test  1 

.0484 
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*0 
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LRMF 

.0532 
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.0582 
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.0958 
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p = 0.25 

LRMF 
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.0860 

.0926 

.0970 

Test  1 

.0664 

.0686 

.0734 

.0710 

.0691 

.0816 

.0966 

.0950 

.1136 

o 

II 

QL 

LRMF 

.0772 

.0742 

.0722 

.0796 

.0828 

.0794 

.0812 

.0820 

.0826 

Test  1 

.0782 

.0727 

.0790 

.0737 

.0750 

.0848 

.0758 

.0794 

.0882 
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p = 0.99 

LRMF 

.0504 

.4798 

.9808 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

Test  1 

.0466 

.4890 

.9788 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

p = 0.95 

LRMF 

.0486 

.1256 

.3842 

.7238 

.9344 

.9928 

1.0000 

1.0000 

1.0000 

Test  1 

.0480 

.1240 

.3828 

.7396 

.9374 

.9928 

.9996 

1.0000 

1.0000 

p = 0.85 

LRMF 

.0538 

.0662 

.1404 

.2718 

.4394 

.6382 

.8008 

.9152 

.9678 

Test  1 

.0482 

.0772 

.1440 

.2752 

.4694 

.6642 

.8370 

.9242 

.9700 

p = 0.75 

LRMF 

.0518 

.0632 

.0972 

.1610 

.2510 

.3812 

.5276 

.6480 

.7786 

Test  1 

.0520 

.0566 

.0990 

.1716 

.2634 

.3878 

.5446 

.6756 

.7854 

*0 

o 

II 

LRMF 

.0546 

.0558 

.0638 

.0728 

.1026 

.1456 

.1942 

.2382 

.2968 

Test  1 

.0492 

.0560 

.0684 

.0846 

.1060 

.1474 

.1894 

.2505 

.3068 

p = 0.25 

LRMF 

.0604 

.0576 

.0616 

.0676 

.0752 

.0732 

.0912 

.0960 

.1090 

Test  1 

.0618 

.0616 

.0646 

.0752 

.0740 

.0836 

.0962 

.0944 

.1052 

p = 0.1 

LRMF 

.0626 

.0628 

.0638 

.0668 

.0672 

.0706 

.0712 

.0718 

.0664 

Test  1 

.0688 

.0726 

.0656 

.0730 

.0750 

.0720 

.0688 

.0788 

.0724 

Separ 

0*00*00*00*00 
1— 1 H cq  CM  CO  CO  Tj* 

Table  E -3:  Proportion  of  times  the  Test  1 and  LRMF  test  statistics  exceeded  the  0.05  upper  quantile  of  the  F2,2(n-2)  distribution 
when  testing  two  groups  of  sample  sizes  20  and  20,  in  three  dimensions;  5000  simulated  values  of  the  test  statistic  were  used. 
Separ  = angular  separation  in  degrees  of  the  true  mean  vectors. 
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p = 0.99 

LRMF 

.0502 

.6766 

.9994 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

Test  1 

.0472 

.6726 

.9992 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

p = 0.95 

LRMF 

.0446 

.1696 

.5520 

.8906 

.9936 

.9990 

1.0000 

1.0000 

1.0000 

Test  1 

.0502 

.1802 

.5616 

.8968 

.9906 

.9998 

1.0000 

1.0000 

1.0000 

p = 0.85 

LRMF 

.0462 

.0838 

.1974 

.3916 

.6262 

.8264 

.9350 

.9858 

.9980 

Test  1 

.0510 

.0830 

.2030 

.4110 

.6454 

.8472 

.9484 

.9864 

.9984 

p = 0.75 

LRMF 

.0538 

.0688 

.1210 

.2228 

.3596 

.5440 

.6994 

.8362 

.9232 

Test  1 

.0490 

.0666 

.1312 

.2444 

.3868 

.5558 

.7350 

.8670 

.9338 

LO 

d 

II 

a 

LRMF 

.0534 

.0588 

.0762 

.0980 

.1372 

.2044 

.2566 

.3360 

.4256 

Test  1 

.0504 

.0584 

.0726 

.0958 

.1468 

.1922 

.2640 

.3484 

.4354 

p = 0.25 

LRMF 

.0590 

.0632 

.0698 

.0606 

.0806 

.0900 

.1046 

.1174 

.1364 

Test  1 

.0644 

.0680 

.0600 

.0682 

.0868 

.0956 

.1062 

.1182 

.1400 

d 

II 

<3- 

LRMF 

.0722 

.0750 

.0748 

.0798 

.0824 

.0828 

.0800 

.0894 

.0934 

Test  1 

.0704 

.0778 

.0794 

.0820 

.0818 

.0832 

.0880 

.0886 

.0896 
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66  0 = d 

LRMF 

.0510 

.8024 

.9998 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

Test  1 

.0516 

.8006 

.9998 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

LO 

o> 

d 

II 

a 

LRMF 

.0554 

.2154 

.6892 

.9650 

.9994 

1.0000 

1.0000 

1.0000 

1.0000 

Test  1 

.0488 

.2114 

.6916 

.9654 

.9994 

1.0000 

1.0000 

1.0000 

1.0000 

LO 

00 

d 

II 

CD 

LRMF 

.0522 

.0968 

.2534 

.5186 

.7760 

.9190 

.9818 

.9966 

1.0000 

Test  1 

.0546 

.0976 

.2628 

.5310 

.7908 

.9420 

.9858 

.9982 

1.0000 

p = 0.75 

LRMF 

.0558 

.0740 

.1508 

.2956 

.4730 

.6734 

.8334 

.9388 

.9792 

Test  1 

.0450 

.0680 

.1522 

.3116 

.5108 

.7054 

.8600 

.9462 

.9844 

iO 

d 

II 

LRMF 

.0520 

.0564 

.0864 

.1120 

.1766 

.2496 

.3332 

.4494 

.5410 

Test  1 

.0506 

.0588 

.0750 

.1212 

.1764 

.2526 

.3574 

.4614 

.5652 

LO 

CN 

d 

1 

LRMF 

.0608 

.0598 

.0592 

.0746 

.0824 

.0990 

.1114 

.1344 

.1730 

Test  1 

.0542 
.0630 
.0650 
.0774 
.0840 
.0913 
.1124 
.1  103 
.1582 

d 

II 

a 

LRMF 

.0674 

.0686 

.0660 

.0634 

.0714 

.0710 

.0724 

.0866 

.0812 

Test  1 

.0708 

.0650 

.0681 

.0670 

.0646 

.0746 

.0776 

.0834 

.0853 
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p = 0.99 

LRMF 

.0522 

.9386 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

Test  1 

.0520 

.9416 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

p = 0.95 

LRMF 

.0494 

.3070 

.8662 

.9972 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

Test  1 

.0510 

.3074 

.8712 

.9962 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

tO 

00 

O 

II 

LRMF 

.0502 

.1174 

.3538 

.6910 

.9150 

.9898 

.9992 

.9998 

1.0000 

Test  1 

.0468 

.1252 

.3726 

.7180 

.9354 

.9912 

.9998 

1.0000 

1.0000 

lO 

o 

II 

Q. 

LRMF 

.0468 

.0832 

.1990 

.4160 

.6576 

.8488 

.9524 

.9908 

.9976 

Test  1 

.0550 

.0824 

.2148 

.4382 

.7034 

.8744 

.9648 

.9928 

.9994 

lO 

o 

II 

a 

LRMF 

.0542 

.0636 

.0944 

.1610 

.2462 

.3602 

.4732 

.6184 

.7208 

Test  1 

.0480 

.0624 

.0964 

.1610 

.2442 

.3696 

.4918 

.6260 

.7470 

p = 0.25 

LRMF 

.0574 

.0628 

.0722 

.0796 

.0940 

.1144 

.1490 

.1778 

.2262 

Test  1 

.0582 

.0576 

.0672 

.0766 

.0906 

.1158 

.1490 

.1812 

.2216 

O 

II 

a 

LRMF 

.0756 

.0744 

.0714 

.0776 

.0856 

.0958 

.0948 

.0978 

.0968 

Test  1 

.0770 

.0770 

.0814 

.0828 

.0856 

.0912 

.0868 

.0952 

.1004 
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Statistical  methods  for  analyzing  directions  in  space,  or  equivalently  points  on  a 
circle  or  a sphere,  are  needed  in  many  fields,  including  Earth  Sciences,  Astrophysics, 
Biology,  and  Anthropology.  In  order  to  achieve  reliable  results  from  the  analysis,  it  is 
necessary  that  these  methods  take  into  account  the  special  structure  of  the  data.  It 
is  also  very  important  to  select  a model  that  describes  the  data  well  and  at  the  same 
time  is  relatively  easy  to  fit.  In  this  work  we  investigate  the  spherically  projected 
multivariate  linear  model  for  directional  data  in  the  general  d-dimensional  case,  which 
of  course  includes  the  circular  and  spherical  cases.  We  demonstrate  the  flexibility  of 
the  model,  and  show  how  it  can  be  fit  using  several  different  approaches. 

Researchers  often  want  to  determine  whether  groups  of  observations,  coming 
from  different  sources  or  populations,  exhibit  different  characteristics.  Such  determinations 
are  usually  made  by  applying  an  appropriate  statistical  test  of  hypotheses.  Under 
the  SPML  model,  we  have  developed  three  different  tests  of  hypotheses  for  the 
multisample  problem.  The  performance  of  these  tests  has  been  evaluated  using 
simulated  data.  We  also  demonstrate  the  applications  of  the  tests  in  an  example 
with  three-dimensional  anthropological  data. 

Sometimes  it  is  not  possible  to  sample  from  all  the  populations  of  interest. 
Appropriate  models  in  such  situations  are  the  so  called  random  effects  models.  Using 
the  SPML  model  we  have  developed  three  different  such  models  with  increasing 
complexity.  We  have  considered  four  different  methods  for  fitting  the  models  and 
illustrated  their  implementation  in  an  example  with  two-dimensional  simulated  data. 


